GPT-5.1
400K context with adaptive reasoning that allocates more processing to complex retrieved content. Extended prompt caching feature with longer retention optimizes performance for production RAG systems.
Model Information
- Provider
- OpenAI
- License
- Proprietary
- Input Price per 1M
- $1.25
- Output Price per 1M
- $10.00
- Context Window
- 400K
- Release Date
- 2025-11-13
- Model Name
- gpt-5.1
- Total Evaluations
- 810
Performance Record
Wins561 (69.3%)
Losses117 (14.4%)
Ties132 (16.3%)
Wins
Losses
Ties
Performance Overview
ELO ratings by dataset
GPT-5.1's ELO performance varies across different benchmark datasets, showing its strengths in specific domains.
GPT-5.1 - ELO by Dataset
Detailed Metrics
Dataset breakdown
Performance metrics across different benchmark datasets, including accuracy and latency percentiles.
PG
ELO 186787.0% WR235W-29L-6T
Quality Metrics
- Correctness
- 5.00
- Faithfulness
- 5.00
- Grounding
- 5.00
- Relevance
- 5.00
- Completeness
- 4.73
- Overall
- 4.95
Latency Distribution
- Mean
- 29008ms
- Min
- 4393ms
- Max
- 43887ms
MSMARCO
ELO 168855.6% WR150W-63L-57T
Quality Metrics
- Correctness
- 5.00
- Faithfulness
- 5.00
- Grounding
- 5.00
- Relevance
- 5.00
- Completeness
- 4.93
- Overall
- 4.99
Latency Distribution
- Mean
- 9111ms
- Min
- 3841ms
- Max
- 34731ms
SciFact
ELO 164665.2% WR176W-25L-69T
Quality Metrics
- Correctness
- 5.00
- Faithfulness
- 5.00
- Grounding
- 5.00
- Relevance
- 5.00
- Completeness
- 4.97
- Overall
- 4.99
Latency Distribution
- Mean
- 10454ms
- Min
- 4700ms
- Max
- 21205ms
Compare Models
See how it stacks up
Compare GPT-5.1 with other top llms to understand the differences in performance, accuracy, and latency.