Skip to main content

Glossary

Key terms and concepts used throughout DocWeb documentation.

A

Anchor Node

A URL discovered from navigation elements (nav, aside, header) that serves as a structural reference point in the graph. Anchor nodes have an importanceScore of 1.0.

B

BM25

Best Matching 25 - a probabilistic ranking function used for keyword-based search. DocWeb uses BM25 alongside vector search for hybrid retrieval.

C

Cluster

A grouping of related URLs based on their path segments and topic keywords. URLs can belong to multiple overlapping clusters (e.g., /products/api-gateway belongs to both /products and /api).

Cluster Hull

A visual boundary drawn around nodes belonging to the same cluster, calculated using d3-polygon convex hulls.

Content Category

Classification of a URL as primary (core documentation), support (FAQs, guides), utility (search, login), or unknown.

D

Dex

DocWeb's AI chatbot assistant powered by Google Gemini 2.5 Flash. Dex provides source-cited answers using RAG (Retrieval-Augmented Generation).

Discovery

The process of finding and cataloging all URLs on a documentation site using the Waterfall Discovery algorithm.

E

Embedding

A 768-dimensional vector representation of text content, generated using Google's text-embedding-004 model. Used for semantic similarity search.

F

ForceAtlas2

A physics-based graph layout algorithm that positions related nodes closer together. Used in the Sigma.js Visualization.

G

Global Cache

Shared caching layer that stores discovered URLs, scraped content, and embeddings across all users. Reduces redundant crawling and improves performance.

H

Search methodology combining vector similarity (70%) and BM25 keyword matching (30%) for optimal retrieval accuracy.

J

JIT Scraping

Just-in-Time scraping - when Dex needs content from a page that hasn't been scraped yet, it fetches and indexes it on-demand during the chat interaction.

N

Priority 0 discovery stage that extracts structural links from navigation elements, capturing link text for use as page titles.

Neural Map

The interactive graph visualization of a documentation site's structure, rendered using Sigma.js and WebGL.

P

Priority Score

A 0-100 score (lower = more important) assigned to each URL based on documentation keywords, path depth, and URL patterns.

Score RangePriority Level
0-25Critical
26-45High
46-65Medium
66-100Low

R

RAG

Retrieval-Augmented Generation - an AI technique that grounds LLM responses in retrieved documentation content, eliminating hallucinations.

S

Session

A saved discovery instance containing all URLs, scraped content, chat history, and graph state for a specific documentation site.

Structural Node

A URL found in navigation elements that defines the site's information architecture. These nodes have higher importance scores.

T

Thinking Steps

The reasoning process Dex displays when answering questions, showing how it searched, evaluated sources, and formulated its response.

W

Waterfall Discovery

DocWeb's multi-stage URL discovery algorithm: Nav-Scrape → Sitemap Discovery → Navigation Crawl → Merge & Classify. See Waterfall Discovery.