Rerankers

Explore Reranking Models

Browse our directory of reranking models. Compare performance, and choose the right reranker for your RAG application.

Understanding Rerankers

What are rerankers?

The Two-Stage Retrieval Process

Rerankers are specialized models that enhance search quality in Retrieval-Augmented Generation (RAG) systems. They follow a two-stage retrieval process: first, an embedding model retrieves a set of potentially relevant documents; then, the reranker re-evaluates those results and reorders them so the most relevant documents appear at the top. This second step turns broad retrieval into precise context selection.

Why Rerankers Matter

Embedding models are fast but often miss subtle relevance. Rerankers apply cross-attention mechanisms to better understand the relationship between a query and a document. In practice, this delivers 15–40 % higher retrieval accuracy and more relevant results compared to embeddings alone.

When to Use a Reranker

Rerankers matter when accuracy and context quality are critical. They’re especially useful if your knowledge base contains similar or overlapping documents or if queries require nuanced reasoning. Most add only 100–600 ms of latency while sharply improving the documents passed to your LLM.

Selection Guide

Choosing the right reranker

For Maximum Accuracy

Choose top-performing proprietary models like Cohere Rerank 3.5 or Voyage Rerank 2.5. These models deliver the highest accuracy scores and are ideal for production applications where answer quality is paramount.

Best for:

  • • Customer-facing chatbots
  • • High-stakes decision support
  • • Complex technical documentation

For Self-Hosting

Open-source models like Jina Reranker v2 and bge-reranker-v2-m3 offer excellent performance with full control over deployment. These models can be hosted on your infrastructure, ensuring data privacy and cost control.

Best for:

  • • Data privacy requirements
  • • High-volume applications
  • • Custom fine-tuning needs

For Low Latency

Voyage Rerank 2.5 and Cohere Rerank 3.5 offer the fastest response times at around 595-603ms average latency, making them ideal when response time is critical for your use case.

Best for:

  • • Real-time chat applications
  • • Mobile applications
  • • High-concurrency scenarios

For Multilingual Support

Zerank 1 and Voyage Rerank 2.5 excel at cross-lingual reranking, handling queries and documents in multiple languages. Check individual model pages for specific language support details.

Best for:

  • • International applications
  • • Multilingual documentation
  • • Cross-language search