Claude Opus 4.6
1M token context window (beta) with state-of-the-art coding and agentic capabilities. Highest score on Terminal-Bench 2.0 and Humanity's Last Exam. Improved planning, longer task persistence, and better debugging skills. Context compaction and adaptive thinking enable longer-running tasks.
Model Information
- Provider
- Anthropic
- License
- Proprietary
- Input Price per 1M
- $5.00
- Output Price per 1M
- $25.00
- Context Window
- 1000K
- Release Date
- 2026-02-05
- Model Name
- anthropic-claude-opus-4-6
- Total Evaluations
- 990
Performance Record
Wins740 (74.7%)
Losses93 (9.4%)
Ties157 (15.9%)
Wins
Losses
Ties
Performance Overview
ELO ratings by dataset
Claude Opus 4.6's ELO performance varies across different benchmark datasets, showing its strengths in specific domains.
Claude Opus 4.6 - ELO by Dataset
Detailed Metrics
Dataset breakdown
Performance metrics across different benchmark datasets, including accuracy and latency percentiles.
PG
ELO 180777.3% WR255W-52L-23T
Quality Metrics
- Correctness
- 5.00
- Faithfulness
- 5.00
- Grounding
- 5.00
- Relevance
- 5.00
- Completeness
- 5.00
- Overall
- 5.00
Latency Distribution
- Mean
- 16812ms
- Min
- 11207ms
- Max
- 26006ms
MSMARCO
ELO 178381.2% WR268W-16L-46T
Quality Metrics
- Correctness
- 5.00
- Faithfulness
- 5.00
- Grounding
- 5.00
- Relevance
- 5.00
- Completeness
- 5.00
- Overall
- 5.00
Latency Distribution
- Mean
- 7669ms
- Min
- 3748ms
- Max
- 12462ms
SciFact
ELO 175165.8% WR217W-25L-88T
Quality Metrics
- Correctness
- 4.55
- Faithfulness
- 4.64
- Grounding
- 4.64
- Relevance
- 5.00
- Completeness
- 4.36
- Overall
- 4.64
Latency Distribution
- Mean
- 10159ms
- Min
- 4747ms
- Max
- 19093ms
Compare Models
See how it stacks up
Compare Claude Opus 4.6 with other top llms to understand the differences in performance, accuracy, and latency.