Back to all LLMs

Claude Opus 4.6

1M token context window (beta) with state-of-the-art coding and agentic capabilities. Highest score on Terminal-Bench 2.0 and Humanity's Last Exam. Improved planning, longer task persistence, and better debugging skills. Context compaction and adaptive thinking enable longer-running tasks.

Leaderboard Rank
#1
of 12
ELO Rating
1780
#1
Win Rate
74.8%
#1
Latency
11547ms
#6

Model Information

Provider
Anthropic
License
Proprietary
Input Price per 1M
$5.00
Output Price per 1M
$25.00
Context Window
1000K
Release Date
2026-02-05
Model Name
anthropic-claude-opus-4-6
Total Evaluations
990

Performance Record

Wins740 (74.7%)
Losses93 (9.4%)
Ties157 (15.9%)
Wins
Losses
Ties

Performance Overview

ELO ratings by dataset

Claude Opus 4.6's ELO performance varies across different benchmark datasets, showing its strengths in specific domains.

Claude Opus 4.6 - ELO by Dataset

Detailed Metrics

Dataset breakdown

Performance metrics across different benchmark datasets, including accuracy and latency percentiles.

PG

ELO 180777.3% WR255W-52L-23T

Quality Metrics

Correctness
5.00
Faithfulness
5.00
Grounding
5.00
Relevance
5.00
Completeness
5.00
Overall
5.00

Latency Distribution

Mean
16812ms
Min
11207ms
Max
26006ms

MSMARCO

ELO 178381.2% WR268W-16L-46T

Quality Metrics

Correctness
5.00
Faithfulness
5.00
Grounding
5.00
Relevance
5.00
Completeness
5.00
Overall
5.00

Latency Distribution

Mean
7669ms
Min
3748ms
Max
12462ms

SciFact

ELO 175165.8% WR217W-25L-88T

Quality Metrics

Correctness
4.55
Faithfulness
4.64
Grounding
4.64
Relevance
5.00
Completeness
4.36
Overall
4.64

Latency Distribution

Mean
10159ms
Min
4747ms
Max
19093ms

Compare Models

See how it stacks up

Compare Claude Opus 4.6 with other top llms to understand the differences in performance, accuracy, and latency.