GPT-5.4 Pro

Deep reasoning model that uses more compute to think harder. Produces smarter and more precise responses for complex tasks. Operates via Responses API with multi-turn reasoning - some requests may take several minutes. Does not support structured outputs or fine-tuning. If you want to compare the best LLMs for your data, try Agentset.

Leaderboard Rank

#13

of 16

ELO Rating

1330

#13

Win Rate

24.6%

#13

Latency

75663ms

#16

Model Information

Provider: OpenAI
License: Proprietary
Input Price per 1M: $30.00
Output Price per 1M: $180.00
Context Window: 1050K
Release Date: 2026-03-05
Model Name: gpt-5.4-pro
Total Evaluations: 1350

Performance Record

Wins332 (24.6%)

Losses846 (62.7%)

Ties172 (12.7%)

Wins

Losses

Ties

LLMs Are Just One Piece of RAG

Agentset gives you a managed RAG pipeline with the top-ranked models and best practices baked in. No infrastructure to maintain, no LLM orchestration to manage.

Schedule Demo Login

Trusted by teams building production RAG applications

5M+

Documents

1,500+

Teams

99.9%

Uptime

Performance Overview

ELO ratings by dataset

GPT-5.4 Pro's ELO performance varies across different benchmark datasets, showing its strengths in specific domains.

GPT-5.4 Pro - ELO by Dataset

Detailed Metrics

Dataset breakdown

Performance metrics across different benchmark datasets, including accuracy and latency percentiles.

SciFact

ELO 150727.6% WR124W-230L-96T

Quality Metrics

Correctness: 4.87
Faithfulness: 4.90
Grounding: 4.87
Relevance: 4.93
Completeness: 4.87
Overall: 4.89

Latency Distribution

Mean: 2148ms
Min: 1111ms
Max: 3838ms

PG

ELO 127730.2% WR136W-289L-25T

Quality Metrics

Correctness: 5.00
Faithfulness: 5.00
Grounding: 5.00
Relevance: 5.00
Completeness: 4.97
Overall: 4.99

Latency Distribution

Mean: 156451ms
Min: 57901ms
Max: 250411ms

MSMARCO

ELO 120716.0% WR72W-327L-51T

Quality Metrics

Correctness: 4.97
Faithfulness: 5.00
Grounding: 5.00
Relevance: 4.93
Completeness: 4.73
Overall: 4.93

Latency Distribution

Mean: 68388ms
Min: 8911ms
Max: 165229ms

Build RAG in Minutes, Not Months

Agentset gives you a complete RAG API with top-ranked LLMs and smart retrieval built in. Upload your data, call the API, and get grounded answers from day one.

Schedule Demo Read the docs

import { Agentset } from "agentset";

const agentset = new Agentset();
const ns = agentset.namespace("ns_1234");

const results = await ns.search(
  "What is multi-head attention?"
);

for (const result of results) {
  console.log(result.text);
}