Skip to main content

Cloud Functions

DocWeb uses Firebase Cloud Functions (Node.js 20) for its backend logic.

discoverSitemaps

Handles URL discovery using the Waterfall Discovery algorithm.

Memory: 1 GiB | Timeout: 300s

Actions

ActionDescription
findSitemapsDiscover sitemap URLs from robots.txt
processSitemapsParse sitemap XML files
crawlSidebarNavigation link extraction
discoverAndSaveFull waterfall discovery + save
cleanupSessionArchive previous session data
updateStatusUpdate URL statuses

Request (discoverAndSave)

{
url: string;
appId: string;
action: "discoverAndSave";
searchId?: string;
depth?: number;
}

Response

{
success: boolean;
urlCount: number;
savedCount: number;
sessionId: string;
rootUrl: string;
clusters: ClusterStats[];
source: "waterfall" | "cache";
discoveryStats: {
sitemapUrls: number;
navigationUrls: number;
navScrapeUrls: number;
totalUnique: number;
sources: Record<string, number>;
};
fromCache: boolean;
cacheAge?: number;
}

scrapeNodes

Handles content extraction and Markdown conversion.

Memory: 512 MiB | Timeout: 120s

Actions

ActionDescriptionMax Nodes
autoScrapeScrape structural nodes after discovery10
scrapeClusterScrape all nodes in a cluster20
scrapeNodesScrape specific node IDsUnlimited

Request

{
action: "autoScrape" | "scrapeCluster" | "scrapeNodes";
appId: string;
searchId?: string;
clusterId?: string;
docIds?: string[];
maxNodes?: number;
}

Response

{
success: boolean;
scraped: number;
failed: number;
skipped: number;
results: ScrapeResult[];
}

chat

Powers the Dex AI chatbot with RAG-based responses.

Memory: 1 GiB | Timeout: 120s

Features

  • Greeting detection with smart topic suggestions
  • Hybrid search (vector + keyword)
  • Just-in-Time (JIT) scraping for unscraped pages
  • Site map awareness
  • Source citations with relevance scores
  • Thinking steps display
  • Related pages suggestions

Request

{
message: string;
appId: string;
searchId: string;
conversationId?: string;
siteName?: string;
}

Response

{
success: boolean;
response: string;
conversationId: string;
sources: SourceCitation[];
codeBlocks: CodeBlock[];
messageId: string;
sourceNodeIds: string[]; // For graph highlighting
isGreeting: boolean;
thinkingSteps: string[]; // Dex's reasoning process
suggestedTopics?: SuggestedTopic[];
relatedPages?: RelatedPage[];
}

generateEmbeddings

Generates vector embeddings for semantic search.

Memory: 1 GiB | Timeout: 540s

Actions

ActionDescription
generateForSearchGenerate embeddings for a search session
generateForDocGenerate embedding for single document
regenerateAllRegenerate all embeddings for session

Embedding Process

  1. Combine title + description + mainContent
  2. Chunk text (4,000 chars, 400 overlap)
  3. Generate embedding per chunk via Gemini text-embedding-004
  4. Store in global collection (deduplicated by URL hash)
  5. Track searchIds for cross-session sharing

Session Management

getSessions

Returns all saved sessions for the authenticated user.

deleteSession

Deletes a session and cascades to URLs/conversations.

updateSessionAccess

Updates lastAccessedAt timestamp when user switches sessions.


Credits & Billing

getUserCredits

Returns user's credit profile including tier, remaining credits, and reset time.

checkCredits

Verifies user has sufficient credits for an action.

deductCredits

Deducts credits after successful action.

createCheckoutSession

Creates Stripe checkout session for subscription upgrade.

createCustomerPortal

Creates Stripe customer portal link for subscription management.

getSubscriptionStatus

Returns current subscription status.

stripeWebhook

Handles Stripe webhook events:

  • checkout.session.completed
  • customer.subscription.updated
  • customer.subscription.deleted
  • invoice.payment_succeeded
  • invoice.payment_failed

Cache Admin

getCachedDomains

Lists all cached domains with metadata.

getCachedPages

Lists cached pages for a specific domain.

deleteCachedDomain

Clears all cache for a domain.

deleteCachedPage

Clears cache for a single page URL.