Preview โ€” full styling will appear after the next deploy completes.

rag-system-design

Prompting, Grounding, and Citation

Building prompts that force grounded, cited answers

Even with excellent retrieval, prompts determine whether the model answers from evidence or synthesises unsupported claims. A strong RAG prompt is explicit about allowed evidence, output format, citation requirements, and what to do when the context is insufficient.

flowchart TD
    CTX[Retrieved Context
with source IDs] --> PT[Grounded Prompt]
    SYS[System Instruction
answer from context only] --> PT
    Q([User Question]) --> PT
    PT --> LLM[LLM]
    LLM --> CHK{Evidence
found?}
    CHK -->|yes| ANS[Answer + Citations
doc1 ยท doc2]
    CHK -->|no| ABS([Abstain
not enough evidence])

A retrieval-augmented system can fetch perfect context and still produce a hallucinated answer if the prompt does not constrain generation correctly. Common failure modes include: the model ignoring retrieved context and answering from parametric memory, citation mismatch (the cited source does not actually contain the stated fact), unsupported synthesis (combining facts from multiple sources into a claim none of them individually support), and refusal failure (the model claims uncertainty even when the answer is clearly in the context).

A strong RAG prompt has four explicit components: an instruction to answer only from provided context; the context itself with clear source delimiters and IDs; the user question; and an output format specification that includes how to cite sources. The instruction should also cover the abstention case โ€” what the model should say when the context does not contain enough evidence. "I don't have enough information to answer this from the provided sources" is far better than a confident hallucination.

Citation format matters for downstream utility. Simple source IDs like [doc1] are parseable and enable verification workflows. Numbered footnotes work for document-style output. Some systems require the model to quote the supporting span verbatim โ€” this dramatically reduces hallucination because the model cannot synthesise beyond what is literally present. For high-stakes applications like legal or medical RAG, verbatim quotes plus automated verification against the retrieved chunks are worth the extra complexity.