Logo

dev-resources.site

for different kinds of informations.

Enhancing Hybrid Search in MongoDB: Combining RRF, Thresholds, and Weights

Published at
12/20/2024
Categories
mongodb
ai
vectordatabase
rag
Author
shannonlal
Categories
4 categories in total
mongodb
open
ai
open
vectordatabase
open
rag
open
Author
10 person written this
shannonlal
open
Enhancing Hybrid Search in MongoDB: Combining RRF, Thresholds, and Weights

In my previous blogs, I explored implementing basic hybrid search in MongoDB, combining vector and text search capabilities(https://dev.to/shannonlal/optimizing-mongodb-hybrid-search-with-reciprocal-rank-fusion-4p3h). While this approach worked, I encountered challenges in getting the most relevant results. This blog discusses three key improvements I've implemented: Reciprocal Rank Fusion (RRF), similarity thresholds, and search type weighting.

The Three Pillars of Enhanced Hybrid Search

1. Reciprocal Rank Fusion (RRF)

RRF is a technique that helps combine results from different search methods by considering their ranking positions. Instead of simply adding scores, RRF uses a formula that gives more weight to higher-ranked results while smoothing out score differences:

{
  $addFields: {
    vs_rrf_score: {
      $multiply: [
        0.4, // vectorWeight
        { $divide: [1.0, { $add: ['$rank', 60] }] },
      ],
    },
  },
}
Enter fullscreen mode Exit fullscreen mode

2. Similarity Thresholds

To ensure quality results, I've added minimum thresholds for both vector and text search scores:

// Vector search threshold
{
  $match: {
    vectorScore: { $gte: 0.9 }
  }
}

// Text search threshold
{
  $match: {
    textScore: { $gte: 0.5 }
  }
}
Enter fullscreen mode Exit fullscreen mode

This prevents low-quality matches from appearing in the results, even if they would have received a boost from the RRF calculation. In the example above I have chosen 0.9 for vector similarity score and 0.5 for text; however, you can adjust these based on your search results with your data.

3. Weighted Search Types

Different search types perform better for different queries. I've implemented weights to balance their contributions:

{
  $addFields: {
    combined_score: {
      $add: [
        { $multiply: [{ $ifNull: ['$vectorScore', 0] }, 0.4] },
        { $multiply: [{ $ifNull: ['$textScore', 0] }, 0.6] }
      ]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

In this example I am giving a bit more weight to the text search results over the vector search, but again you can adjust these based on your search tests.

Putting It All Together

Here's a simplified version of the complete pipeline:

[
  // Vector Search with threshold
  {
    $vectorSearch: {
      index: 'ai_image_vector_description',
      path: 'descriptionValues',
      queryVector: embedding,
      filter: {
        userId: userId,
        deleted: false,
      }
    }
  },
  { $match: { vectorScore: { $gte: 0.9 } } },
  // RRF calculation for vector search
  {
    $group: {
      _id: null,
      docs: { $push: '$$ROOT' }
    }
  },
  // ... RRF calculation stages ...
  {
    $unionWith: {
      // Text search pipeline with similar structure
    }
  },
  // Final combination and sorting
  {
    $sort: { combined_score: -1 }
  }
]
Enter fullscreen mode Exit fullscreen mode

Benefits and Results

This enhanced approach provides several benefits:

  1. More relevant results by considering both ranking position and raw scores
  2. Quality control through minimum thresholds
  3. Flexible weighting to optimize for different use cases

The combination of these techniques has significantly improved our search results, particularly for queries where simple score addition wasn't providing optimal ordering.

Next Steps

Future improvements could include:

  • Dynamic weight adjustment based on query characteristics
  • Additional quality metrics beyond simple thresholds
  • Performance optimization for larger datasets

By implementing these enhancements, we've created a more robust and reliable hybrid search system that better serves our users' needs.

vectordatabase Article's
30 articles in total
Favicon
Binary embedding: shrink vector storage by 95%
Favicon
Analyzing LinkedIn Company Posts with Graphs and Agents
Favicon
OpenSearchCon Europe 2025 - Amsterdam!
Favicon
The Best Embedding Models for Information Retrieval in 2025
Favicon
How to Chat with PDFs Using AI via API
Favicon
FalkorDB has integrated with cognee to improve AI-driven knowledge retrieval
Favicon
What Founders Must Do in Agentic LLM Era
Favicon
Vector Databases: Your AI's New Best Friend
Favicon
Vector Database for Modern Applications
Favicon
Introducing VecSpark
Favicon
pg_auto_embeddings — text embeddings directly in Postgres, without extensions
Favicon
Relational Databases Holding You Back?
Favicon
ChromaDB for the SQL Mind
Favicon
Getting started with LLM APIs
Favicon
Understanding Vector Databases: A Beginner's Guide
Favicon
Setup PostgreSQL w/ pgvector in a docker container
Favicon
Simplest markdown component for your AI apps
Favicon
Semantic search with Azure MS SQL and EF Core
Favicon
Announcing 12 Days of Codemas: The DataStax Holiday Giveaway!
Favicon
Enhancing Hybrid Search in MongoDB: Combining RRF, Thresholds, and Weights
Favicon
Serverless semantic search - AWS Lambda, AWS Bedrock, Neon
Favicon
How to integrate pgvector's Docker image with Langchain?
Favicon
Weekly Updates - Dec 20, 2024
Favicon
Generative AI: A Personal Deep Dive – My Notes and Insights
Favicon
Detecting and Analyzing Comment Quality Using Vector Search
Favicon
Choosing a Vector Store for LangChain
Favicon
Elasticsearch Was Great, But Vector Databases Are the Future
Favicon
Introducing Milvus 2.5: Built-in Full-Text Search, Advanced Query Optimization, and More 🚀
Favicon
Migrating Vector Data from Milvus to TiDB
Favicon
How to Create Your Own RAG with Free LLM Models and a Knowledge Base

Featured ones: