Chain-of-Thought Reasoning
Step-by-step reasoning traces
Chain-of-thought (CoT) prompting asks the model to show its reasoning before producing a final answer. This dramatically improves accuracy on multi-step tasks — arithmetic, logic, planning, and causal reasoning — by forcing the model to decompose the problem rather than pattern-match to a superficial answer.
Chain-of-thought prompting was one of the most important prompt engineering discoveries. The core idea is simple: instead of asking the model to jump directly to an answer, you ask it to "think step by step" or provide examples that include intermediate reasoning. On multi-step tasks (arithmetic word problems, logical deductions, multi-hop questions), CoT can improve accuracy from near-random to near-human levels. The reason is that transformer models compute the answer in a fixed number of layers — reasoning steps in the output effectively give the model more "compute" by making each token a step in the reasoning chain.
There are two variants. Zero-shot CoT appends a trigger phrase like "Let's think step by step" or "Think through this carefully" to the prompt. This is simple and works surprisingly well. Few-shot CoT provides examples where the demonstrations include the full reasoning trace, not just the final answer. Few-shot CoT is more reliable because the examples demonstrate the expected depth and format of reasoning, preventing the model from producing shallow or tangential chains.
CoT has failure modes. Unfaithful reasoning — where the model produces a plausible-looking chain that leads to the wrong answer — is common when the task exceeds the model's capability. Long chains can drift off-topic or accumulate errors. And CoT adds significant token overhead, making it expensive for high-volume applications. The practical rule is: use CoT when the task requires more than one logical step, verify that the reasoning chain actually supports the final answer, and consider whether the latency and cost tradeoff is acceptable for your use case.
Key Concepts
- CoT asks the model to show reasoning before answering — dramatically improves multi-step accuracy
- Zero-shot CoT: append "Let's think step by step" — simple and surprisingly effective
- Few-shot CoT: provide examples with full reasoning traces for more reliable chains
- CoT gives the model more effective compute by spreading reasoning across output tokens
- Watch for unfaithful reasoning — plausible chains that lead to wrong answers