| Rank | Model Name | Score | Provider | License | Dimensions | Link |
|---|---|---|---|---|---|---|
🥇1 | text-embedding-3-large | 95.1 | OpenAI | Proprietary | 3072 | — |
🥈2 | Voyage AI v2 | 93.8 | Voyage AI | Proprietary | 1536 | — |
🥉3 | Cohere embed-v3 | 92.4 | Cohere | Proprietary | 1024 | — |
4 | E5-mistral-7b-instruct | 90.7 | Microsoft | Open Source | 4096 | — |
5 | BGE-large-en-v1.5 | 89.2 | BAAI | Open Source | 1024 | — |
6 | Jina Embeddings v2 | 87.9 | Jina AI | Open Source | 768 | — |
7 | text-embedding-3-small | 86.5 | OpenAI | Proprietary | 1536 | — |
8 | Nomic embed-text-v1.5 | 85.1 | Nomic AI | Open Source | 768 | — |
9 | GTE-large | 83.8 | Alibaba | Open Source | 1024 | — |
Key Insights
What the data tells us
Voyage leads in quality
Voyage 3 achieves the highest retrieval accuracy (98.5), excelling at semantic search across diverse domains. The model shows consistent performance on both long and short queries.
Open source catching up
Models like BGE-M3 and E5-Mistral-7B prove that open-source embeddings can compete with proprietary options while offering full control and cost efficiency.
Dimensions matter less
Higher dimension count doesn't guarantee better performance. Many top performers use 1024 dimensions, proving efficient architectures beat raw size.
Methodology
How we rank embedding models
Quality scores are based on MTEB (Massive Text Embedding Benchmark) retrieval benchmarks, which evaluate models on semantic search tasks across diverse datasets. Scores are normalized to a 0-100 scale where 100 represents the best performance. We focus on retrieval-specific metrics that matter most for RAG applications.
Testing Process
Each embedding model is evaluated on retrieval tasks from MTEB, including question answering, document retrieval, and semantic similarity. We measure how well models encode semantic meaning for finding relevant documents.
Evaluation Metrics
Primary metrics include nDCG@10 and MRR (Mean Reciprocal Rank) on retrieval tasks. These measure how effectively models place relevant documents at the top of search results.
Score Calculation
Scores are normalized relative to the best-performing model in our tests. A score of 100 represents peak performance, while lower scores indicate proportional decreases in retrieval accuracy.
FAQ
Common questions
- What is an embedding model?
- An embedding model converts text into dense numerical vectors that capture semantic meaning. These vectors enable similarity search, allowing RAG systems to find relevant documents by comparing meaning rather than just keywords.
- How do embedding dimensions affect performance?
- More dimensions can capture more nuanced information, but don't always lead to better results. Models with 768-1536 dimensions often perform as well as larger ones while being faster and more efficient.
- Should I use a proprietary or open-source embedding model?
- Proprietary models like Voyage and OpenAI offer top performance with managed infrastructure. Open-source options like BGE-M3 provide strong results with full control and lower long-term costs. Choose based on your accuracy needs and deployment preferences.
- How often should I update my embedding model?
- Upgrading requires re-embedding your entire corpus, which can be time-consuming. Only upgrade when there's a significant performance improvement (5+ points) or when your RAG quality noticeably degrades.