Back to all LLMs

GPT-5.1

400K context with adaptive reasoning that allocates more processing to complex retrieved content. Extended prompt caching feature with longer retention optimizes performance for production RAG systems.

Leaderboard Rank
#1
of 10
ELO Rating
1711
#1
Win Rate
69.3%
#1
Latency
16191ms
#7

Model Information

Provider
OpenAI
License
Proprietary
Input Price per 1M
$1.25
Output Price per 1M
$10.00
Context Window
400K
Release Date
2025-11-13
Model Name
gpt-5.1
Total Evaluations
810

Performance Record

Wins561 (69.3%)
Losses117 (14.4%)
Ties132 (16.3%)
Wins
Losses
Ties

Performance Overview

ELO ratings by dataset

GPT-5.1's ELO performance varies across different benchmark datasets, showing its strengths in specific domains.

GPT-5.1 - ELO by Dataset

Detailed Metrics

Dataset breakdown

Performance metrics across different benchmark datasets, including accuracy and latency percentiles.

PG

ELO 186787.0% WR235W-29L-6T

Quality Metrics

Correctness
5.00
Faithfulness
5.00
Grounding
5.00
Relevance
5.00
Completeness
4.73
Overall
4.95

Latency Distribution

Mean
29008ms
Min
4393ms
Max
43887ms

MSMARCO

ELO 168855.6% WR150W-63L-57T

Quality Metrics

Correctness
5.00
Faithfulness
5.00
Grounding
5.00
Relevance
5.00
Completeness
4.93
Overall
4.99

Latency Distribution

Mean
9111ms
Min
3841ms
Max
34731ms

SciFact

ELO 164665.2% WR176W-25L-69T

Quality Metrics

Correctness
5.00
Faithfulness
5.00
Grounding
5.00
Relevance
5.00
Completeness
4.97
Overall
4.99

Latency Distribution

Mean
10454ms
Min
4700ms
Max
21205ms

Compare Models

See how it stacks up

Compare GPT-5.1 with other top llms to understand the differences in performance, accuracy, and latency.