GPT-OSS 120B

131K context with Apache 2.0 license for full customization and self-hosting. Configurable reasoning depth with <think> tags and single 80GB GPU deployment for self-hosted RAG. If you want to compare the best LLMs for your data, try Agentset.

Leaderboard Rank

#16

of 16

ELO Rating

1242

#16

Win Rate

14.2%

#16

Latency

11302ms

Model Information

Provider: OpenAI
License: Open Source
Input Price per 1M: $0.04
Output Price per 1M: $0.19
Context Window: 131K
Release Date: 2025-08-05
Model Name: gpt-oss-120b
Total Evaluations: 1350

Performance Record

Wins192 (14.2%)

Losses1029 (76.2%)

Ties129 (9.6%)

Wins

Losses

Ties

LLMs Are Just One Piece of RAG

Agentset gives you a managed RAG pipeline with the top-ranked models and best practices baked in. No infrastructure to maintain, no LLM orchestration to manage.

Schedule Demo Login

Trusted by teams building production RAG applications

5M+

Documents

1,500+

Teams

99.9%

Uptime

Performance Overview

ELO ratings by dataset

GPT-OSS 120B's ELO performance varies across different benchmark datasets, showing its strengths in specific domains.

GPT-OSS 120B - ELO by Dataset

Detailed Metrics

Dataset breakdown

Performance metrics across different benchmark datasets, including accuracy and latency percentiles.

PG

ELO 128223.8% WR107W-332L-11T

Quality Metrics

Correctness: 4.87
Faithfulness: 4.87
Grounding: 4.87
Relevance: 4.90
Completeness: 4.83
Overall: 4.87

Latency Distribution

Mean: 19128ms
Min: 1317ms
Max: 69491ms

MSMARCO

ELO 125413.3% WR60W-340L-50T

Quality Metrics

Correctness: 4.93
Faithfulness: 4.93
Grounding: 4.93
Relevance: 4.97
Completeness: 4.80
Overall: 4.91

Latency Distribution

Mean: 5616ms
Min: 1255ms
Max: 20330ms

SciFact

ELO 11905.6% WR25W-357L-68T

Quality Metrics

Correctness: 4.70
Faithfulness: 4.80
Grounding: 4.80
Relevance: 4.73
Completeness: 4.57
Overall: 4.72

Latency Distribution

Mean: 9160ms
Min: 1606ms
Max: 35709ms

Build RAG in Minutes, Not Months

Agentset gives you a complete RAG API with top-ranked LLMs and smart retrieval built in. Upload your data, call the API, and get grounded answers from day one.

Schedule Demo Read the docs

import { Agentset } from "agentset";

const agentset = new Agentset();
const ns = agentset.namespace("ns_1234");

const results = await ns.search(
  "What is multi-head attention?"
);

for (const result of results) {
  console.log(result.text);
}