Leaderboard
Reranker Leaderboard
Performance comparison of the top rerankers for Retrieval-Augmented Generation (RAG), tested on diverse datasets.
Last updated: November 4, 2025
| Compare | |||||||
|---|---|---|---|---|---|---|---|
🥇1 | Zerank 1 | 1642 | 0.676 | 1126 | $0.025 | cc-by-nc-4.0 | |
🥈2 | Voyage AI Rerank 2.5 | 1629 | 0.680 | 610 | $0.050 | Proprietary | |
🥉3 | Contextual AI Rerank v2 Instruct | 1550 | 0.687 | 3010 | $0.050 | cc-by-nc-4.0 | |
4 | Voyage AI Rerank 2.5 Lite | 1510 | 0.679 | 607 | $0.020 | Proprietary | |
5 | BAAI/BGE Reranker v2 M3 | 1468 | 0.686 | 1891 | $0.020 | Apache 2.0 | |
6 | Zerank 1 Small | 1458 | 0.676 | 1109 | $0.025 | Apache 2.0 | |
7 | Cohere Rerank 3.5 | 1403 | 0.689 | 492 | $0.050 | Proprietary | |
8 | Jina Reranker v2 Base Multilingual | 1335 | 0.671 | 1411 | $0.045 | cc-by-nc-4.0 |
Overview
Our Recommendation
We recommend Voyage Rerank 2.5 as the best overall reranker for production use.
Highest Win Rate
Wins more head-to-head matchups than any other model across all benchmarks.
Superior Speed
Runs faster than other top rerankers while keeping accuracy high - ideal for production.
Strong Accuracy
Delivers high nDCG and Recall scores. Surfaces the right context without missing key details.
Methodology
How We Evaluate Rerankers
The Reranker Leaderboard tests models on three datasets — financial queries, scientific claims, and essay-style content — to see how well they adapt to different retrieval patterns in RAG pipelines.
Testing Process
Each reranker is tested on the same FAISS-retrieved documents (top-50). We measure both ranking quality and latency, capturing the real-world balance between accuracy and speed.
ELO Score
For each query, GPT-5 compares two ranked lists and picks the more relevant one. Wins and losses feed into an ELO rating — higher scores mean more consistent wins.
Evaluation Metrics
We measure nDCG@5/10 for ranking precision and Recall@5/10 for coverage. Together, they show how well a reranker surfaces relevant results at the top.
Common questions
Reranker FAQ
- What is a reranker?
- A reranker refines an initial list of retrieved results by reordering them so the most relevant documents appear first. Unlike basic retrieval models, rerankers use deeper scoring methods to improve search precision and ranking quality.
- Why do I need a reranker for RAG?
- Rerankers make Retrieval-Augmented Generation (RAG) systems more accurate. They ensure your LLM receives the most relevant context, leading to better-grounded answers — especially when your knowledge base is large or overlapping.
- How much do rerankers improve results?
- In our benchmarks, rerankers improved retrieval accuracy by 15–40 % compared to semantic search alone. That means cleaner context, fewer hallucinations, and more reliable RAG performance.
- Why use ELO scoring for ranking?
- ELO scoring measures how often one model outperforms another in direct comparisons. It reflects real-world consistency better than isolated metrics — a higher ELO means the model wins more head-to-head matchups across diverse queries.
- Which datasets are used for evaluation?
- We benchmark rerankers on three datasets — FiQA (finance), SciFact (science), and PG (long-form content). PG doesn’t include labeled relevance data, so it’s evaluated only with ELO-based LLM judgments, not traditional metrics like nDCG or Recall.
- Should I use an open-source or proprietary reranker?
- Open-source rerankers like Jina v2 offer great performance and full control for self-hosting. Proprietary options like Cohere provide slightly better accuracy and managed infrastructure. Choose based on your accuracy requirements and deployment preferences.