A retrieval-augmented system can fetch perfect context and still produce a hallucinated answer if the prompt does not constrain generation correctly. Common failure modes include: the model ignoring retrieved context and answering from parametric memory, citation mismatch (the cited source does not actually contain the stated fact), unsupported synthesis (combining facts from multiple sources into a claim none of them individually support), and refusal failure (the model claims uncertainty even when the answer is clearly in the context).
A strong RAG prompt has four explicit components: an instruction to answer only from provided context; the context itself with clear source delimiters and IDs; the user question; and an output format specification that includes how to cite sources. The instruction should also cover the abstention case โ what the model should say when the context does not contain enough evidence. "I don't have enough information to answer this from the provided sources" is far better than a confident hallucination.
Citation format matters for downstream utility. Simple source IDs like [doc1] are parseable and enable verification workflows. Numbered footnotes work for document-style output. Some systems require the model to quote the supporting span verbatim โ this dramatically reduces hallucination because the model cannot synthesise beyond what is literally present. For high-stakes applications like legal or medical RAG, verbatim quotes plus automated verification against the retrieved chunks are worth the extra complexity.