Preview — full styling will appear after the next deploy completes.

rag-system-design

Advanced RAG Patterns

Hybrid search, parent-child retrieval, query decomposition, and agentic loops

Advanced RAG combines lexical and semantic search, uses parent-child retrieval for granular indexing with coherent generation context, decomposes multi-hop questions into sub-queries, and optionally uses agentic retrieval loops — with careful attention to latency and failure paths.

flowchart TD
    Q([User Query]) --> DC[Decompose
Sub-queries]
    DC --> HYB[Hybrid Search]
    HYB --> SEM[Semantic ANN
dense]
    HYB --> LEX[Lexical BM25
sparse]
    SEM --> FUS[Score Fusion
RRF]
    LEX --> FUS
    FUS --> PC[Parent-Child
Expand to Section]
    PC --> AGT{Agentic
Loop?}
    AGT -->|sufficient| GEN[Generate Answer]
    AGT -->|insufficient| Q

Hybrid search is the most reliable single improvement for production RAG. Dense embedding search excels at semantic similarity but misses exact strings — product codes, version numbers, names, acronyms. Sparse BM25 or TF-IDF search is the opposite: exact match is strong but semantic generalisation is poor. Combining them with a reciprocal rank fusion or learned score weighting captures both dimensions. Most production systems above a certain query volume will benefit from hybrid search.

Parent-child retrieval solves a fundamental tension in RAG: you want small chunks for precise retrieval but large, coherent sections for generation. The solution is to index small child chunks for retrieval, then at generation time fetch the larger parent section that contains the matched child. This way you get granular indexing without the incoherence of passing fragmented sentences to the generator. Query decomposition addresses multi-hop questions by breaking "Who founded the company that acquired X?" into sequential sub-queries, each retrievable independently.

Agentic retrieval — where an LLM decides whether retrieved context is sufficient and iteratively refines its retrieval strategy — is powerful but carries real costs. Each iteration adds latency and an additional LLM call. Failure modes compound: a mis-guided first retrieval can produce a mis-guided refinement query. Agentic patterns make sense for complex research tasks where quality matters more than latency. For most production RAG use cases, a well-designed non-agentic pipeline with query decomposition and hybrid search will outperform a fragile agentic loop.