Gemini 2 Is the Top Model for Embeddings

Gemini Embedding 2 benchmark results overview Google released Gemini Embedding 2 - their first natively multimodal embedding model. It's 3,072-dimensional, supports up to 8,192 tokens, and natively embeds text, images, audio, and video. Currently free in public preview via the Gemini API.

We added it to our Embedding Leaderboard and ran it against all 17 existing models across 7 retrieval datasets.

TL;DR

Gemini Embedding 2 takes #1 with 1605 Elo and a 59.5% win rate
Only 18 Elo separate the top three: Gemini Embedding 2, zembed-1, and Voyage 4
Strongest on scientific retrieval (70.6% on SciFact) and Arabic QA (59.6% on ARCD)
Weakest on financial QA (50.6% on FiQA) - barely above a coin flip
Beats its predecessor Gemini text-embedding-004 in 80% of matchups

What We Found

Leads the leaderboard, narrowly

Gemini Embedding 2 reaches 1605 Elo. zembed-1 sits at 1590, Voyage 4 at 1586. The gap is real but narrow, with a different query set the ordering could shift.

ELO rankings of top embedding models showing Gemini Embedding 2 at 1605

Below #3, there's a visible drop. Jina v5 Small and OpenAI text-3-large form the next tier around 1563-1566.

Domain performance varies

The overall win rate hides variance across datasets.

Gemini Embedding 2 win rates by dataset showing variance across domains

SciFact is the strongest result: 71% win rate. ARCD (Arabic QA) is also strong at 60%.

FiQA (financial QA) is the weakest: 51%. Barely above a coin flip. Financial retrieval rewards exact terminology and numeric patterns, the model's generalist training doesn't fully capture that. MSMARCO is also below 50%, meaning on general short-query retrieval it doesn't consistently beat the top tier.

Beats mid-tier, even against the top

Against zembed-1 and Voyage 4, Gemini Embedding 2 wins 54% of matchups. Against mid-tier models, the gap widens.

Pairwise win rates of Gemini Embedding 2 against other embedding models

The clearest signal is the predecessor comparison. Against text-embedding-004, it wins 48-6, 80% of the time. That's not a refinement. It's a different model class.

How We Tested

Each model embedded the same corpus and queries
Retrieved top-5 results per query, shown to an LLM judge
Judge picks which model's results are more relevant
7 datasets: MSMARCO, FiQA, SciFact, DBPedia, ARCD + two internal sets
Elo computed from 1,065 pairwise judgments

Recommendation

Gemini Embedding 2 is a reasonable default for new pipelines. It leads the leaderboard, beats most models clearly, and costs nothing during preview.

If you're already on zembed-1 or Voyage 4, there's no strong reason to switch. The top three are within noise of each other. At that tier, your chunking strategy or reranker matters more than which embedding model you pick.

We'll keep it on the leaderboard. As the preview ends and pricing is announced, we'll see whether the performance holds at scale.

See the full rankings on the Embedding Leaderboard.

Agentset

Wednesday, March 11, 2026

Gemini 2 Is the Top Model for Embeddings

TL;DR

What We Found

Leads the leaderboard, narrowly

Domain performance varies

Beats mid-tier, even against the top

How We Tested

Recommendation

Agentset

Product

Developers

Compare

Leaderboard

Enterprise

Company

Content

Trust