Preview — full styling will appear after the next deploy completes.

rag-system-design

Chunking Strategy

Fixed, structural, and semantic chunking tradeoffs

Chunk size strongly affects retrieval quality. Tiny chunks lose context; large chunks reduce precision and waste context window budget. The right strategy depends on corpus structure and query style — and should always be validated empirically.

flowchart TD
    D[Document] --> A{Strategy?}
    A -->|uniform corpus| F[Fixed-Size Tokens
with overlap]
    A -->|natural language| S[Sentence-Aware]
    A -->|structured docs| H[Paragraph / Heading]
    A -->|topic boundaries| SM[Semantic Clustering]
    F --> OV[Overlap Window]
    S --> OV
    H --> OV
    SM --> OV
    OV --> EVAL{Measure Recall}
    EVAL -->|needs tuning| A
    EVAL -->|acceptable| IDX[(Index)]

Chunking is the step that decides the unit of retrieval. When a user asks a question, the system retrieves chunks — not full documents. If chunks are too small, each one lacks enough context to be useful. If chunks are too large, a single retrieved chunk may contain the answer buried in irrelevant paragraphs, and you waste tokens passing noise to the generator. There is no universally correct chunk size; it is a hyperparameter you tune per corpus.

Common approaches range from simple to sophisticated. Fixed-size token chunks with a configurable overlap are easy to implement and work well for uniform corpora like API docs. Sentence-aware chunking respects sentence boundaries and handles variable-length content better. Paragraph or heading-based chunking aligns splits with author-intended semantic units and is often the best starting point for structured documents. Semantic chunking groups sentences by embedding similarity, but adds computational cost.

The most important rule is to align chunks with human-readable boundaries whenever possible. A chunk that ends mid-sentence or mid-code-block is almost always a retrieval liability. Overlap windows — where the tail of one chunk repeats as the head of the next — improve recall at the cost of mild index bloat. The practical approach is to start with paragraph-based chunking, measure recall on a small gold dataset, and adjust size and overlap based on where retrieval fails.