Preview — full styling will appear after the next deploy completes.

agentic-ai-patterns

Self-RAG

Fully autonomous retrieval loop

The LLM controls every decision in the RAG pipeline: whether to retrieve, whether chunks are relevant, and whether its own generated answer is faithful and useful.

flowchart TD
    S([__start__]) --> D[decide_retrieval]
    D -->|needs DB| R[retrieve]
    D -->|knows answer| N[generate]
    R --> GC[grade_chunks]
    GC --> N
    N --> C[self_critique]
    C -->|ok| E([__end__])
    C -->|retry| R

Self-RAG gives the LLM full control over the entire retrieval-generation pipeline. At each step, it decides: "Do I need to retrieve data at all, or do I already know the answer?" If retrieval is needed, it grades each chunk for relevance before passing it to the generator.

After generation, the model performs a self-critique: "Is my answer faithful to the retrieved evidence? Is it actually useful for the question?" If either check fails, the system retries retrieval — up to `MAX_RETRIES` times — with a refined query. Only then does it exit to `__end__`.

This pattern is the most autonomous form of RAG and produces the highest-quality answers — but at the cost of variable latency and higher token consumption. It's well suited for high-stakes question-answering where correctness matters more than speed.