Agentset

Leaderboard

Best Embedding Models for RAG

The definitive ranking of embedding models for Retrieval Augmented Generation. Compare performance across proprietary and open-source models.

Last updated: January 15, 2025

Rank	Model Name	Score	Provider	License	Dimensions	Link
🥇1	text-embedding-3-large	95.1	OpenAI	Proprietary	3072	—
🥈2	Voyage AI v2	93.8	Voyage AI	Proprietary	1536	—
🥉3	Cohere embed-v3	92.4	Cohere	Proprietary	1024	—
4	E5-mistral-7b-instruct	90.7	Microsoft	Open Source	4096	—
5	BGE-large-en-v1.5	89.2	BAAI	Open Source	1024	—
6	Jina Embeddings v2	87.9	Jina AI	Open Source	768	—
7	text-embedding-3-small	86.5	OpenAI	Proprietary	1536	—
8	Nomic embed-text-v1.5	85.1	Nomic AI	Open Source	768	—
9	GTE-large	83.8	Alibaba	Open Source	1024	—

Key Insights

What the data tells us

Voyage leads in quality

Voyage 3 achieves the highest retrieval accuracy (98.5), excelling at semantic search across diverse domains. The model shows consistent performance on both long and short queries.

Open source catching up

Models like BGE-M3 and E5-Mistral-7B prove that open-source embeddings can compete with proprietary options while offering full control and cost efficiency.

Dimensions matter less

Higher dimension count doesn't guarantee better performance. Many top performers use 1024 dimensions, proving efficient architectures beat raw size.

Methodology

How we rank embedding models

Quality scores are based on MTEB (Massive Text Embedding Benchmark) retrieval benchmarks, which evaluate models on semantic search tasks across diverse datasets. Scores are normalized to a 0-100 scale where 100 represents the best performance. We focus on retrieval-specific metrics that matter most for RAG applications.

Testing Process

Each embedding model is evaluated on retrieval tasks from MTEB, including question answering, document retrieval, and semantic similarity. We measure how well models encode semantic meaning for finding relevant documents.

Evaluation Metrics

Primary metrics include nDCG@10 and MRR (Mean Reciprocal Rank) on retrieval tasks. These measure how effectively models place relevant documents at the top of search results.

Score Calculation

Scores are normalized relative to the best-performing model in our tests. A score of 100 represents peak performance, while lower scores indicate proportional decreases in retrieval accuracy.

FAQ

Common questions

What is an embedding model?: An embedding model converts text into dense numerical vectors that capture semantic meaning. These vectors enable similarity search, allowing RAG systems to find relevant documents by comparing meaning rather than just keywords.

How do embedding dimensions affect performance?: More dimensions can capture more nuanced information, but don't always lead to better results. Models with 768-1536 dimensions often perform as well as larger ones while being faster and more efficient.

Should I use a proprietary or open-source embedding model?: Proprietary models like Voyage and OpenAI offer top performance with managed infrastructure. Open-source options like BGE-M3 provide strong results with full control and lower long-term costs. Choose based on your accuracy needs and deployment preferences.

How often should I update my embedding model?: Upgrading requires re-embedding your entire corpus, which can be time-consuming. Only upgrade when there's a significant performance improvement (5+ points) or when your RAG quality noticeably degrades.

Related Leaderboards

Explore more rankings

🎯

Best Rerankers

Compare the top reranking models for RAG applications

🤖

Best LLMs for RAG

See which language models excel at RAG tasks

Agentset

Leaderboard

Best Embedding Models for RAG

Key Insights

What the data tells us

Voyage leads in quality

Open source catching up

Dimensions matter less

Methodology

How we rank embedding models

Testing Process

Evaluation Metrics

Score Calculation

FAQ

Common questions

Related Leaderboards

Explore more rankings

Best Rerankers

Best LLMs for RAG

Agentset

Resources

Compare

Contact

Legal