GPT-5.2
400K context with state-of-the-art long-horizon reasoning and tool-calling for complex multi-step projects. Near 100% accuracy on long-context retrieval tasks (4-needle MRCR to 256k tokens) and 98.7% on multi-turn tool usage. Designed for professional knowledge work with stronger vision, coding, and document analysis capabilities.
Model Information
- Provider
- OpenAI
- License
- Proprietary
- Input Price per 1M
- $1.75
- Output Price per 1M
- $14.00
- Context Window
- 400K
- Release Date
- 2025-12-11
- Model Name
- gpt-5.2
- Total Evaluations
- 810
Performance Record
Wins370 (45.7%)
Losses282 (34.8%)
Ties158 (19.5%)
Wins
Losses
Ties
Performance Overview
ELO ratings by dataset
GPT-5.2's ELO performance varies across different benchmark datasets, showing its strengths in specific domains.
GPT-5.2 - ELO by Dataset
Detailed Metrics
Dataset breakdown
Performance metrics across different benchmark datasets, including accuracy and latency percentiles.
SciFact
ELO 163142.6% WR115W-61L-94T
Quality Metrics
- Correctness
- 4.87
- Faithfulness
- 5.00
- Grounding
- 4.97
- Relevance
- 4.97
- Completeness
- 4.73
- Overall
- 4.91
Latency Distribution
- Mean
- 4785ms
- Min
- 1318ms
- Max
- 10172ms
PG
ELO 158764.8% WR175W-77L-18T
Quality Metrics
- Correctness
- 5.00
- Faithfulness
- 5.00
- Grounding
- 5.00
- Relevance
- 5.00
- Completeness
- 4.97
- Overall
- 4.99
Latency Distribution
- Mean
- 8702ms
- Min
- 2755ms
- Max
- 14361ms
MSMARCO
ELO 155129.6% WR80W-144L-46T
Quality Metrics
- Correctness
- 5.00
- Faithfulness
- 5.00
- Grounding
- 5.00
- Relevance
- 4.97
- Completeness
- 4.87
- Overall
- 4.97
Latency Distribution
- Mean
- 2652ms
- Min
- 796ms
- Max
- 5810ms
Compare Models
See how it stacks up
Compare GPT-5.2 with other top llms to understand the differences in performance, accuracy, and latency.