View all leaderboards

Reranker Leaderboard

Performance comparison of the top rerankers for Retrieval-Augmented Generation (RAG), tested on diverse datasets.

Last updated: November 25, 2025

Compare
1644
0.195265$0.025cc-by-nc-4.0
1627
0.219614$0.050Proprietary
1574
0.192266$0.025cc-by-nc-4.0
1547
0.235613$0.050Proprietary
1541
0.202248$0.025Apache 2.0
1528
0.226616$0.020Proprietary
1506
0.216447$0.050Proprietary
1461
0.2303333$0.050cc-by-nc-4.0
1452
0.200392$0.050Proprietary
1314
0.2012383$0.020Apache 2.0
1306
0.193746$0.045cc-by-nc-4.0

Overview

Our Recommendation

We recommend Zerank 2 as the best overall reranker for production use.

Highest Win Rate

Wins more head-to-head matchups than any other model across all benchmarks.

Superior Speed

Runs faster than other top rerankers while keeping accuracy high - ideal for production.

Strong Accuracy

Delivers high nDCG and Recall scores. Surfaces the right context without missing key details.

Understanding Rerankers

What are rerankers?

The Two-Stage Retrieval Process

Rerankers are specialized models that enhance search quality in Retrieval-Augmented Generation (RAG) systems. They follow a two-stage retrieval process: first, an embedding model retrieves a set of potentially relevant documents; then, the reranker re-evaluates those results and reorders them so the most relevant documents appear at the top. This second step turns broad retrieval into precise context selection.

Why Rerankers Matter

Embedding models are fast but often miss subtle relevance. Rerankers apply cross-attention mechanisms to better understand the relationship between a query and a document. In practice, this delivers 15–40 % higher retrieval accuracy and more relevant results compared to embeddings alone.

When to Use a Reranker

Rerankers matter when accuracy and context quality are critical. They're especially useful if your knowledge base contains similar or overlapping documents or if queries require nuanced reasoning. Most add only 100–600 ms of latency while sharply improving the documents passed to your LLM.

Selection Guide

Choosing the right reranker

For Maximum Accuracy

Choose top-performing models like Zerank 2 or Voyage Rerank 2.5. These models deliver the highest accuracy scores and are ideal for production applications where answer quality is paramount.

Best for:

  • Customer-facing chatbots
  • High-stakes decision support
  • Complex technical documentation

For Self-Hosting

Open-source models like Jina Reranker v2 and bge-reranker-v2-m3 offer excellent performance with full control over deployment. These models can be hosted on your infrastructure, ensuring data privacy and cost control.

Best for:

  • Data privacy requirements
  • High-volume applications
  • Custom fine-tuning needs

For Low Latency

Voyage Rerank 2.5 and Cohere Rerank 3.5 offer the fastest response times at around 595-603ms average latency, making them ideal when response time is critical for your use case.

Best for:

  • Real-time chat applications
  • Mobile applications
  • High-concurrency scenarios

For Multilingual Support

Zerank 2 and Voyage Rerank 2.5 excel at cross-lingual reranking, handling queries and documents in multiple languages. Check individual model pages for specific language support details.

Best for:

  • International applications
  • Multilingual documentation
  • Cross-language search

Methodology

How We Evaluate Rerankers

The Reranker Leaderboard tests models on three datasets — financial queries, scientific claims, and essay-style content — to see how well they adapt to different retrieval patterns in RAG pipelines.

Testing Process

Each reranker is tested on the same FAISS-retrieved documents (top-50). We measure both ranking quality and latency, capturing the real-world balance between accuracy and speed.

ELO Score

For each query, GPT-5 compares two ranked lists and picks the more relevant one. Wins and losses feed into an ELO rating — higher scores mean more consistent wins.

Evaluation Metrics

We measure nDCG@5/10 for ranking precision and Recall@5/10 for coverage. Together, they show how well a reranker surfaces relevant results at the top.

Common questions

Reranker FAQ

What is a reranker?
A reranker refines an initial list of retrieved results by reordering them so the most relevant documents appear first. Unlike basic retrieval models, rerankers use deeper scoring methods to improve search precision and ranking quality.
Why do I need a reranker for RAG?
Rerankers make Retrieval-Augmented Generation (RAG) systems more accurate. They ensure your LLM receives the most relevant context, leading to better-grounded answers — especially when your knowledge base is large or overlapping.
How much do rerankers improve results?
In our benchmarks, rerankers improved retrieval accuracy by 15–40 % compared to semantic search alone. That means cleaner context, fewer hallucinations, and more reliable RAG performance.
Why use ELO scoring for ranking?
ELO scoring measures how often one model outperforms another in direct comparisons. It reflects real-world consistency better than isolated metrics — a higher ELO means the model wins more head-to-head matchups across diverse queries.
Which datasets are used for evaluation?
We benchmark rerankers on three datasets — FiQA (finance), SciFact (science), and PG (long-form content). PG doesn't include labeled relevance data, so it's evaluated only with ELO-based LLM judgments, not traditional metrics like nDCG or Recall.
Should I use an open-source or proprietary reranker?
Open-source rerankers like Jina v2 offer great performance and full control for self-hosting. Proprietary options like Cohere provide slightly better accuracy and managed infrastructure. Choose based on your accuracy requirements and deployment preferences.