Summary
Self-Consistency is a decoding strategy that samples multiple reasoning paths from the LLM and selects the most frequent answer. By aggregating diverse reasoning traces, this technique reduces errors from single-path reasoning and improves answer reliability. It builds on Chain of Thought by sampling multiple CoT outputs rather than using greedy decoding.
How it works
- Generate multiple paths: Sample N reasoning traces for the same problem
- Extract answers: Parse final answers from each trace
- Vote/aggregate: Select the most common answer
- Return result: Output the consensus answer
Key considerations
- Sample count: More samples increase accuracy but cost
- Aggregation: Majority vote, weighted, or confidence-based
- Diversity: Temperature and sampling parameters affect variety
- Speed: Parallel generation can mitigate latency
When to use
- Tasks where multiple reasoning paths exist
- Applications requiring high reliability
- Scenarios where cost-latency trade-off is acceptable
- Math, logic, and factual reasoning tasks