GPT-5.4

1M+ context window frontier model for complex professional work. Supports native computer-use, web search, file search, code interpreter, and MCP integration. Adjustable reasoning effort levels (none to xhigh). 33% reduction in hallucinations vs GPT-5.2. If you want to compare the best LLMs for your data, try Agentset.

Leaderboard Rank

#10

of 16

ELO Rating

1418

#10

Win Rate

31.9%

#11

Latency

3108ms

Model Information

Provider: OpenAI
License: Proprietary
Input Price per 1M: $2.50
Output Price per 1M: $15.00
Context Window: 1050K
Release Date: 2026-03-05
Model Name: gpt-5.4
Total Evaluations: 1350

Performance Record

Wins431 (31.9%)

Losses721 (53.4%)

Ties198 (14.7%)

Wins

Losses

Ties

LLMs Are Just One Piece of RAG

Agentset gives you a managed RAG pipeline with the top-ranked models and best practices baked in. No infrastructure to maintain, no LLM orchestration to manage.

Schedule Demo Login

Trusted by teams building production RAG applications

5M+

Documents

1,500+

Teams

99.9%

Uptime

Performance Overview

ELO ratings by dataset

GPT-5.4's ELO performance varies across different benchmark datasets, showing its strengths in specific domains.

GPT-5.4 - ELO by Dataset

Detailed Metrics

Dataset breakdown

Performance metrics across different benchmark datasets, including accuracy and latency percentiles.

SciFact

ELO 155930.9% WR139W-204L-107T

Quality Metrics

Correctness: 4.87
Faithfulness: 4.87
Grounding: 4.87
Relevance: 4.93
Completeness: 4.80
Overall: 4.87

Latency Distribution

Mean: 2165ms
Min: 1207ms
Max: 4297ms

MSMARCO

ELO 134823.6% WR106W-291L-53T

Quality Metrics

Correctness: 4.97
Faithfulness: 4.97
Grounding: 4.97
Relevance: 4.93
Completeness: 4.80
Overall: 4.93

Latency Distribution

Mean: 1861ms
Min: 888ms
Max: 3548ms

PG

ELO 134741.3% WR186W-226L-38T

Quality Metrics

Correctness: 5.00
Faithfulness: 5.00
Grounding: 5.00
Relevance: 5.00
Completeness: 5.00
Overall: 5.00

Latency Distribution

Mean: 5296ms
Min: 2948ms
Max: 17651ms

Build RAG in Minutes, Not Months

Agentset gives you a complete RAG API with top-ranked LLMs and smart retrieval built in. Upload your data, call the API, and get grounded answers from day one.

Schedule Demo Read the docs

import { Agentset } from "agentset";

const agentset = new Agentset();
const ns = agentset.namespace("ns_1234");

const results = await ns.search(
  "What is multi-head attention?"
);

for (const result of results) {
  console.log(result.text);
}