Preview — full styling will appear after the next deploy completes.

rag-system-design

Retrieval Design and Query Processing

Recall, filtering, and query quality

A user query is often not the best retrieval query. Query rewriting, expansion, and decomposition can significantly improve results. Good retrieval is more about system design — metadata filters, top-k tuning, hybrid search — than raw model choice.

flowchart TD
    UQ([User Query]) --> QR[Query Rewrite
& Expand]
    QR --> H[Hybrid Search]
    H --> SEM[Semantic ANN
dense vectors]
    H --> LEX[Lexical BM25
keyword match]
    SEM --> MF[Merge & Filter
metadata · access]
    LEX --> MF
    MF --> TOPK[Top-K Candidates]
    TOPK --> LOG[Log for Eval]

The biggest retrieval quality lever is often the query itself. Users write queries the way they would ask a colleague — short, colloquial, ambiguous. The embedding of "How do I fix slow checkout?" may not closely match the embeddings of the relevant documentation sections, which might use words like "payment latency" or "cart performance". Query rewriting reformulates the user query into a better retrieval query before sending it to the vector index. Expansion adds related terms. Decomposition splits a compound question into sub-queries that can be retrieved independently.

Beyond the query, retrieval design includes: metadata filtering (restrict candidates to a specific date range, author, document type, or access tier), top-k tuning (how many candidates to retrieve — too few hurts recall, too many floods the reranker), hybrid search (combine dense semantic search with sparse keyword search for exact term matching), and multi-vector strategies (index both the full chunk and a generated question that the chunk answers). Each of these has meaningful impact on downstream answer quality.

Observability matters as much as the retrieval architecture itself. You should log every query, the top-k candidates returned, whether the answer was ultimately grounded in retrieved context, and any user feedback signals. Retrieval failures and generation failures look the same to the end user but require completely different fixes. A structured eval loop — log, inspect, categorise failures, improve the relevant stage — is the fastest path to a reliable RAG system.