Cohere Rerank 4 Fast

Fast cross-encoder reranker for enterprise search and RAG, built for low-latency production workloads. Supports up to 32K context and strong multilingual retrieval across 100+ languages, with optional self-learning to adapt to your domain over time. If you want to compare the best rerankers for your data, try Agentset.

Leaderboard Rank

of 12

ELO Rating

1510

Win Rate

49.8%

Accuracy (nDCG@10)

0.094

Latency

447ms

Model Information

Provider: Cohere
License: Proprietary
Price per 1M tokens: $0.050
Release Date: 2025-12-11
Model Name: rerank-v4.0-fast
Total Evaluations: 3300

Performance Record

Wins1643 (49.8%)

Losses1540 (46.7%)

Ties117 (3.5%)

Wins

Losses

Ties

Rerankers Are Just One Piece of RAG

Agentset gives you a managed RAG pipeline with the top-ranked models and best practices baked in. No infrastructure to maintain, no reranking to configure.

Schedule Demo Login

Trusted by teams building production RAG applications

5M+

Documents

1,500+

Teams

99.9%

Uptime

Performance Overview

ELO ratings by dataset

Cohere Rerank 4 Fast's ELO performance varies across different benchmark datasets, showing its strengths in specific domains.

Cohere Rerank 4 Fast - ELO by Dataset

Detailed Metrics

Dataset breakdown

Performance metrics across different benchmark datasets, including accuracy and latency percentiles.

business reports

ELO 160356.2% WR309W-231L-10T

Accuracy Metrics

nDCG@5: 0.000
nDCG@10: 0.000
Recall@5: 0.000
Recall@10: 0.000

Latency Distribution

Mean: 428ms
P50 (Median): 408ms
P90: 550ms

DBPedia

ELO 158041.4% WR228W-282L-40T

Accuracy Metrics

nDCG@5: 0.000
nDCG@10: 0.000
Recall@5: 0.000
Recall@10: 0.000

Latency Distribution

Mean: 297ms
P50 (Median): 297ms
P90: 309ms

MSMARCO

ELO 150145.1% WR248W-251L-51T

Accuracy Metrics

nDCG@5: 0.000
nDCG@10: 0.000
Recall@5: 0.000
Recall@10: 0.000

Latency Distribution

Mean: 403ms
P50 (Median): 382ms
P90: 486ms

PG

ELO 147441.6% WR229W-321L-0T

Accuracy Metrics

nDCG@5: 0.000
nDCG@10: 0.000
Recall@5: 0.000
Recall@10: 0.000

Latency Distribution

Mean: 492ms
P50 (Median): 439ms
P90: 650ms

arguana

ELO 147262.2% WR342W-203L-5T

Accuracy Metrics

nDCG@5: 0.351
nDCG@10: 0.425
Recall@5: 0.660
Recall@10: 0.880

Latency Distribution

Mean: 574ms
P50 (Median): 562ms
P90: 728ms

FiQa

ELO 142952.2% WR287W-252L-11T

Accuracy Metrics

nDCG@5: 0.135
nDCG@10: 0.138
Recall@5: 0.125
Recall@10: 0.130

Latency Distribution

Mean: 485ms
P50 (Median): 459ms
P90: 624ms

Build RAG in Minutes, Not Months

Agentset gives you a complete RAG API with top-ranked rerankers and embedding models built in. Upload your data, call the API, and get accurate results from day one.

Schedule Demo Read the docs

import { Agentset } from "agentset";

const agentset = new Agentset();
const ns = agentset.namespace("ns_1234");

const results = await ns.search(
  "What is multi-head attention?"
);

for (const result of results) {
  console.log(result.text);
}