Leaderboard

Best Embedding Models for RAG

The definitive ranking of embedding models for Retrieval Augmented Generation. Compare performance across proprietary and open-source models.

Last updated: January 15, 2025

RankModel NameScoreProviderLicenseDimensionsLink
🥇1
text-embedding-3-large
95.1
OpenAIProprietary3072—
🥈2
Voyage AI v2
93.8
Voyage AIProprietary1536—
🥉3
Cohere embed-v3
92.4
CohereProprietary1024—
4
E5-mistral-7b-instruct
90.7
MicrosoftOpen Source4096—
5
BGE-large-en-v1.5
89.2
BAAIOpen Source1024—
6
Jina Embeddings v2
87.9
Jina AIOpen Source768—
7
text-embedding-3-small
86.5
OpenAIProprietary1536—
8
Nomic embed-text-v1.5
85.1
Nomic AIOpen Source768—
9
GTE-large
83.8
AlibabaOpen Source1024—

Key Insights

What the data tells us

Voyage leads in quality

Voyage 3 achieves the highest retrieval accuracy (98.5), excelling at semantic search across diverse domains. The model shows consistent performance on both long and short queries.

Open source catching up

Models like BGE-M3 and E5-Mistral-7B prove that open-source embeddings can compete with proprietary options while offering full control and cost efficiency.

Dimensions matter less

Higher dimension count doesn't guarantee better performance. Many top performers use 1024 dimensions, proving efficient architectures beat raw size.

Methodology

How we rank embedding models

Quality scores are based on MTEB (Massive Text Embedding Benchmark) retrieval benchmarks, which evaluate models on semantic search tasks across diverse datasets. Scores are normalized to a 0-100 scale where 100 represents the best performance. We focus on retrieval-specific metrics that matter most for RAG applications.

Testing Process

Each embedding model is evaluated on retrieval tasks from MTEB, including question answering, document retrieval, and semantic similarity. We measure how well models encode semantic meaning for finding relevant documents.

Evaluation Metrics

Primary metrics include nDCG@10 and MRR (Mean Reciprocal Rank) on retrieval tasks. These measure how effectively models place relevant documents at the top of search results.

Score Calculation

Scores are normalized relative to the best-performing model in our tests. A score of 100 represents peak performance, while lower scores indicate proportional decreases in retrieval accuracy.

FAQ

Common questions

What is an embedding model?
An embedding model converts text into dense numerical vectors that capture semantic meaning. These vectors enable similarity search, allowing RAG systems to find relevant documents by comparing meaning rather than just keywords.
How do embedding dimensions affect performance?
More dimensions can capture more nuanced information, but don't always lead to better results. Models with 768-1536 dimensions often perform as well as larger ones while being faster and more efficient.
Should I use a proprietary or open-source embedding model?
Proprietary models like Voyage and OpenAI offer top performance with managed infrastructure. Open-source options like BGE-M3 provide strong results with full control and lower long-term costs. Choose based on your accuracy needs and deployment preferences.
How often should I update my embedding model?
Upgrading requires re-embedding your entire corpus, which can be time-consuming. Only upgrade when there's a significant performance improvement (5+ points) or when your RAG quality noticeably degrades.