Blog

What’s happening at Agentset.

Stay informed with product updates, company news, and insights on how to sell smarter at your company.

Featured

Monday, February 9, 2026

We tested Gemini 3 inside an actual retrieval setup and compared it directly with GPT-5.1 across five areas that matter for RAG.

Umida Muratbekova

Friday, February 6, 2026

Claude Opus 4.6 Performance in RAG

We evaluated Claude Opus 4.6 in a RAG setup to understand how it performs across factual retrieval, synthesis, and scientific tasks compared to 11 frontier models.

Umida Muratbekova

Monday, January 26, 2026

How to Detect Hallucinations in RAG

RAG helps a lot. But hallucinations still happen. We tested four detection approaches to find which ones work best for production—comparing accuracy, latency, and cost trade-offs.

Enes Halit

Monday, February 9, 2026

Umida Muratbekova

Voyage 4: Evaluation Notes

We tested Gemini 3 inside an actual retrieval setup and compared it directly with GPT-5.1 across five areas that matter for RAG.

Friday, February 6, 2026

Umida Muratbekova

Claude Opus 4.6 Performance in RAG

We evaluated Claude Opus 4.6 in a RAG setup to understand how it performs across factual retrieval, synthesis, and scientific tasks compared to 11 frontier models.

Monday, January 26, 2026

Enes Halit

How to Detect Hallucinations in RAG

RAG helps a lot. But hallucinations still happen. We tested four detection approaches to find which ones work best for production—comparing accuracy, latency, and cost trade-offs.

Thursday, December 25, 2025

Umida Muratbekova

Multimodal vs Text Embeddings: Performance Comparison

We compared a text-based and a multimodal embedding pipeline across text, tables, and charts to see where multimodal actually helps.

Thursday, December 18, 2025

Umida Muratbekova

Gemini 3 Flash: A strong factual RAG model

We evaluated Gemini 3 Flash in a RAG setup to understand where it performs best and where its limitations show, focusing on factual retrieval, reasoning depth, and grounding.

Saturday, December 13, 2025

Umida Muratbekova

Cohere Rerank 4: A real upgrade over 3.5

We benchmarked Cohere Rerank 4 Pro and Fast against v3.5 and other rerankers under the same RAG pipeline.

Friday, December 12, 2025

Umida Muratbekova

GPT-5.2 RAG Performance: We Tested It

We plugged GPT-5.2 into our LLM RAG leaderboard and compared it against nine other frontier models under the same RAG pipeline.

Friday, December 5, 2025

Umida Muratbekova

Best Vector Databases for RAG

We reviewed seven popular vector databases to understand how they differ in deployment, cost, and where they fit in real RAG systems.

Tuesday, November 25, 2025

Umida Muratbekova

Opus 4.5 is the new best model for RAG

An evaluation of Opus 4.5 inside a real retrieval setup, compared against Gemini 3 Pro and GPT 5.1 across five behaviors that matter for RAG.

Wednesday, November 19, 2025

Umida Muratbekova

Gemini 3 vs GPT 5.1 for RAG

We tested Gemini 3 inside an actual retrieval setup and compared it directly with GPT-5.1 across five areas that matter for RAG.

Sunday, November 16, 2025

Umida Muratbekova

Embedding models have converged

We compared 13 embedding models across 8 datasets using an LLM judge and ELO scoring. The result: almost all of them perform in the same narrow band.

Friday, November 7, 2025

Umida Muratbekova

Best Reranker for RAG: We tested the top models

We benchmarked eight leading rerankers under identical conditions to find which one performs best for real-world RAG pipelines — comparing speed, accuracy, and LLM-judged relevance.

Monday, October 27, 2025

Umida Muratbekova

Cohere vs ZeRank: Which Reranker Actually Performs Better?

We compared Cohere v3.5 and ZeRank-1 in a RAG pipeline using a BEIR subset and a custom dataset — analyzing accuracy, latency, and LLM preference.

Thursday, May 1, 2025

Abdellatif Abdelfattah

Building Effective RAG Pipelines: A Practical Guide

Learn how to design and implement robust retrieval-augmented generation (RAG) pipelines, from document processing to retrieval optimization.

Tuesday, April 15, 2025

Abdellatif Abdelfattah

Is RAG Dead?

OpenAI released the GPT 4.1 models supporting 1M token context window. Gemini supports up to 10M tokens in research. Is the RAG era over?

Tuesday, March 25, 2025

Abdellatif Abdelfattah

Automate Business Workflows with AI Agents

Discover how AI agents can transform business operations by automating complex workflows, reducing manual effort, and improving efficiency.

Monday, March 10, 2025

Abdellatif Abdelfattah

Building a Proof-of-Concept RAG System in an Afternoon

A practical guide to quickly building a functional retrieval-augmented generation system to demonstrate the value of AI-powered document search.

Tuesday, February 25, 2025

Abdellatif Abdelfattah

The Art of Document Chunking for LLM Applications

Explore the nuances of effective document chunking strategies for retrieval-augmented generation systems and how they impact LLM performance.

Monday, February 10, 2025

Abdellatif Abdelfattah

Parsing PDF Documents at Scale

Learn strategies and techniques to efficiently extract structured information from large volumes of PDF documents for use in AI applications.

Agentset

Blog

What’s happening at Agentset.

Featured

Voyage 4: Evaluation Notes

Claude Opus 4.6 Performance in RAG

How to Detect Hallucinations in RAG

Multimodal vs Text Embeddings: Performance Comparison

Gemini 3 Flash: A strong factual RAG model

Cohere Rerank 4: A real upgrade over 3.5

GPT-5.2 RAG Performance: We Tested It

Best Vector Databases for RAG

Opus 4.5 is the new best model for RAG

Gemini 3 vs GPT 5.1 for RAG

Embedding models have converged

Best Reranker for RAG: We tested the top models

Cohere vs ZeRank: Which Reranker Actually Performs Better?

Building Effective RAG Pipelines: A Practical Guide

Is RAG Dead?

Automate Business Workflows with AI Agents

Building a Proof-of-Concept RAG System in an Afternoon

The Art of Document Chunking for LLM Applications

Parsing PDF Documents at Scale

Agentset

Product

Developers

Compare

Leaderboard

Enterprise

Company

Content

Trust