GPT-5.2

400K context with state-of-the-art long-horizon reasoning and tool-calling for complex multi-step projects. Near 100% accuracy on long-context retrieval tasks (4-needle MRCR to 256k tokens) and 98.7% on multi-turn tool usage. Designed for professional knowledge work with stronger vision, coding, and document analysis capabilities.

Leaderboard Rank

of 11

ELO Rating

1585

Win Rate

44.7%

Latency

5380ms

Model Information

Provider: OpenAI
License: Proprietary
Input Price per 1M: $1.75
Output Price per 1M: $14.00
Context Window: 400K
Release Date: 2025-12-11
Model Name: gpt-5.2
Total Evaluations: 900

Performance Record

Wins402 (44.7%)

Losses324 (36.0%)

Ties174 (19.3%)

Wins

Losses

Ties

Performance Overview

ELO ratings by dataset

GPT-5.2's ELO performance varies across different benchmark datasets, showing its strengths in specific domains.

GPT-5.2 - ELO by Dataset

Detailed Metrics

Dataset breakdown

Performance metrics across different benchmark datasets, including accuracy and latency percentiles.

SciFact

ELO 164041.0% WR123W-70L-107T

Quality Metrics

Correctness: 4.97
Faithfulness: 5.00
Grounding: 5.00
Relevance: 5.00
Completeness: 4.83
Overall: 4.96

Latency Distribution

Mean: 4785ms
Min: 1318ms
Max: 10172ms

MSMARCO

ELO 155928.7% WR86W-166L-48T

Quality Metrics

Correctness: 5.00
Faithfulness: 5.00
Grounding: 5.00
Relevance: 4.97
Completeness: 4.87
Overall: 4.97

Latency Distribution

Mean: 2652ms
Min: 796ms
Max: 5810ms

PG

ELO 155564.3% WR193W-88L-19T

Quality Metrics

Correctness: 5.00
Faithfulness: 5.00
Grounding: 5.00
Relevance: 4.97
Completeness: 4.97
Overall: 4.99

Latency Distribution

Mean: 8702ms
Min: 2755ms
Max: 14361ms

Compare Models

See how it stacks up

Compare GPT-5.2 with other top llms to understand the differences in performance, accuracy, and latency.

vs GPT-5.1

OpenAI

ELO1743

Win Rate68.8%

Compare now →

vs Grok 4 Fast

xAI

ELO1645

Win Rate58.3%

Compare now →

vs Gemini 3 Flash

Google

ELO1607

Win Rate61.0%

Compare now →

View Full Leaderboard

Agentset

GPT-5.2

Model Information

Performance Record

Performance Overview

ELO ratings by dataset

GPT-5.2 - ELO by Dataset

Detailed Metrics

Dataset breakdown

SciFact

Quality Metrics

Latency Distribution

MSMARCO

Quality Metrics

Latency Distribution

PG

Quality Metrics

Latency Distribution

Compare Models

See how it stacks up

vs GPT-5.1

vs Grok 4 Fast

vs Gemini 3 Flash

Agentset

Product

Developers

Compare

Leaderboard

Enterprise

Company

Content

Trust

Agentset

Product

Developers

Compare

Leaderboard

Enterprise

Company

Content

Trust