GPT-5.4
1M+ context window frontier model for complex professional work. Supports native computer-use, web search, file search, code interpreter, and MCP integration. Adjustable reasoning effort levels (none to xhigh). 33% reduction in hallucinations vs GPT-5.2. If you want to compare the best LLMs for your data, try Agentset.
Model Information
- Provider
- OpenAI
- License
- Proprietary
- Input Price per 1M
- $2.50
- Output Price per 1M
- $15.00
- Context Window
- 1050K
- Release Date
- 2026-03-05
- Model Name
- gpt-5.4
- Total Evaluations
- 1350
Performance Record
LLMs Are Just One Piece of RAG
Agentset gives you a managed RAG pipeline with the top-ranked models and best practices baked in. No infrastructure to maintain, no LLM orchestration to manage.
Trusted by teams building production RAG applications
Performance Overview
ELO ratings by dataset
GPT-5.4's ELO performance varies across different benchmark datasets, showing its strengths in specific domains.
GPT-5.4 - ELO by Dataset
Detailed Metrics
Dataset breakdown
Performance metrics across different benchmark datasets, including accuracy and latency percentiles.
SciFact
Quality Metrics
- Correctness
- 4.87
- Faithfulness
- 4.87
- Grounding
- 4.87
- Relevance
- 4.93
- Completeness
- 4.80
- Overall
- 4.87
Latency Distribution
- Mean
- 2165ms
- Min
- 1207ms
- Max
- 4297ms
MSMARCO
Quality Metrics
- Correctness
- 4.97
- Faithfulness
- 4.97
- Grounding
- 4.97
- Relevance
- 4.93
- Completeness
- 4.80
- Overall
- 4.93
Latency Distribution
- Mean
- 1861ms
- Min
- 888ms
- Max
- 3548ms
PG
Quality Metrics
- Correctness
- 5.00
- Faithfulness
- 5.00
- Grounding
- 5.00
- Relevance
- 5.00
- Completeness
- 5.00
- Overall
- 5.00
Latency Distribution
- Mean
- 5296ms
- Min
- 2948ms
- Max
- 17651ms
Build RAG in Minutes, Not Months
Agentset gives you a complete RAG API with top-ranked LLMs and smart retrieval built in. Upload your data, call the API, and get grounded answers from day one.
import { Agentset } from "agentset";
const agentset = new Agentset();
const ns = agentset.namespace("ns_1234");
const results = await ns.search(
"What is multi-head attention?"
);
for (const result of results) {
console.log(result.text);
}Compare Models
See how it stacks up
Compare GPT-5.4 with other top llms to understand the differences in performance, accuracy, and latency.