GPT-5.2
400K context with state-of-the-art long-horizon reasoning and tool-calling for complex multi-step projects. Near 100% accuracy on long-context retrieval tasks (4-needle MRCR to 256k tokens) and 98.7% on multi-turn tool usage. Designed for professional knowledge work with stronger vision, coding, and document analysis capabilities.
Model Information
- Provider
- OpenAI
- License
- Proprietary
- Input Price per 1M
- $1.75
- Output Price per 1M
- $14.00
- Context Window
- 400K
- Release Date
- 2025-12-11
- Model Name
- gpt-5.2
- Total Evaluations
- 900
Performance Record
Wins402 (44.7%)
Losses324 (36.0%)
Ties174 (19.3%)
Wins
Losses
Ties
Performance Overview
ELO ratings by dataset
GPT-5.2's ELO performance varies across different benchmark datasets, showing its strengths in specific domains.
GPT-5.2 - ELO by Dataset
Detailed Metrics
Dataset breakdown
Performance metrics across different benchmark datasets, including accuracy and latency percentiles.
SciFact
ELO 164041.0% WR123W-70L-107T
Quality Metrics
- Correctness
- 4.97
- Faithfulness
- 5.00
- Grounding
- 5.00
- Relevance
- 5.00
- Completeness
- 4.83
- Overall
- 4.96
Latency Distribution
- Mean
- 4785ms
- Min
- 1318ms
- Max
- 10172ms
MSMARCO
ELO 155928.7% WR86W-166L-48T
Quality Metrics
- Correctness
- 5.00
- Faithfulness
- 5.00
- Grounding
- 5.00
- Relevance
- 4.97
- Completeness
- 4.87
- Overall
- 4.97
Latency Distribution
- Mean
- 2652ms
- Min
- 796ms
- Max
- 5810ms
PG
ELO 155564.3% WR193W-88L-19T
Quality Metrics
- Correctness
- 5.00
- Faithfulness
- 5.00
- Grounding
- 5.00
- Relevance
- 4.97
- Completeness
- 4.97
- Overall
- 4.99
Latency Distribution
- Mean
- 8702ms
- Min
- 2755ms
- Max
- 14361ms
Compare Models
See how it stacks up
Compare GPT-5.2 with other top llms to understand the differences in performance, accuracy, and latency.