Logo

dev-resources.site

for different kinds of informations.

Hybrid Search with Elasticsearch in .NET

Published at
11/5/2024
Categories
dotnet
elasticsearch
csharp
Author
nikiforovall
Categories
3 categories in total
dotnet
open
elasticsearch
open
csharp
open
Author
12 person written this
nikiforovall
open
Hybrid Search with Elasticsearch in .NET

TL;DR

Use the reciprocal rank fusion algorithm to combine the results of BM25 and kNN semantic search.

Source code: https://github.com/NikiforovAll/elasticsearch-dotnet-playground/blob/main/src/elasticsearch-getting-started/02-hybrid-search.ipynb

Introduction

Iโ€™ve prepared a Jupyter notebook that demonstrates how to use the reciprocal rank fusion algorithm using Elastic.Clients.Elasticsearch. You can find the source code here.

Hybrid Search

In my previous blog posts, you have seen two different approaches to search a collection of documents (Semantic Search with Elasticsearch in .NET and Querying and Filtering via Elastic.Clients.Elasticsearch in .NET), each with its own particular benefits. If one of these methods matches your needs then you donโ€™t need anything else, but in many cases each method of searching returns valuable results that the other method would miss, so the best option is to offer a combined result set.

For these cases, Elasticsearch offers Reciprocal Rank Fusion, an algorithm that combines results from two or more lists into a single list.

How RRF Works in Elasticsearch

โ˜๏ธ RRF is based on the concept of reciprocal rank, which is the inverse of the rank of the first relevant document in a list of search results. The goal of the technique is to take into account the position of the items in the original rankings, and give higher importance to items that are ranked higher in multiple lists. This can help improve the overall quality and reliability of the final ranking, making it more useful for the task of fusing multiple ordered search results.

Elasticsearch integrates the RRF algorithm into the search query. Consider the following example, which has query and knn sections to request full-text and vector searches respectively, and a rrf section that combines them into a single result list.

{
    "query":{
        // full-text search query here
    },
    "knn":{
        // vector search query here
    },
    "rank":{
        "rrf": {}
    }
}
Enter fullscreen mode Exit fullscreen mode

While RRF works fairly well for short lists of results without any configuration, there are some parameters that can be tuned to provide the best results. Consult the documentation to learn about these in detail.

Demo

โš ๏ธ I assume you already have the โ€œbook_indexโ€ dataset from my previous post Semantic Search with Elasticsearch in .NET. If you donโ€™t, please follow the instructions in that post to set up the dataset.

1๏ธโƒฃ First, letโ€™s define a method to convert a search query to an embedding vector. I will use Microsoft.Extensions.AI. It requires Azure Open AI model to generate embeddings. I will go with (text-embedding-3-small) because it allows you to specify embedding size. This is important because the size of the embedding vector should match the size of the vector field in the Elasticsearch index and Elasticsearch has a limit of 512 dimensions for vector fields.

๐Ÿ’ก Larger embeddings can capture more nuances and subtle relationships in the data, potentially leading to better model accuracy. However, very large embeddings can also lead to overfitting, where the model performs well on training data but poorly on unseen data. This is because the model might learn to memorize the training data rather than generalize from it.

Here is the code to generate an embedding vector for a given text:

using Azure.AI.OpenAI;
using Microsoft.Extensions.AI;
using System.ClientModel;

AzureOpenAIClient aiClient = new AzureOpenAIClient(
    new Uri(envs["AZURE_OPENAI_ENDPOINT"]),
    new ApiKeyCredential(envs["AZURE_OPENAI_APIKEY"]));

var generator = aiClient.AsEmbeddingGenerator(modelId: "text-embedding-3-small");

async Task<float[]> ToEmbedding(string text)
{
    var textEmbeddingDimension = 384;
    var embeddings = await generator.GenerateAsync([text], new EmbeddingGenerationOptions {
        Dimensions = textEmbeddingDimension
    });

    return embeddings.First().Vector.ToArray();
}
Enter fullscreen mode Exit fullscreen mode

2๏ธโƒฃ Now we can use this method to generate an embedding vector for a search query and use it in both full-text and vector search queries.

var searchQuery = "python programming";
var queryEmbedding = await ToEmbedding(searchQuery);

var searchResponse = await client.SearchAsync<Book>(s => s
    .Index("book_index")
    .Query(d => d.Match(m => m.Field(f => f.Summary).Query(searchQuery)))
    .Knn(d => d
        .Field(f => f.TitleVector)
        .QueryVector(queryEmbedding)
        .k(5)
        .NumCandidates(10))
    .Rank(r => r.Rrf(rrf => {}))
);

PrettyPrint(searchResponse);
Enter fullscreen mode Exit fullscreen mode
โš™๏ธOutput: Hybrid Search

Conclusion

๐Ÿ™Œ I hope you found it helpful. If you have any questions, please feel free to reach out. If youโ€™d like to support my work, a star on GitHub would be greatly appreciated! ๐Ÿ™

References

elasticsearch Article's
30 articles in total
Favicon
Intelligent PDF Data Extraction and database creation
Favicon
Debugging Elasticsearch Cluster Issues: Insights from the Field
Favicon
Search Engine Optimisation
Favicon
Advantages of search databases
Favicon
Advanced Search in .NET with Elasticsearch(Full Video)
Favicon
Real-Time Data Indexing: Powering Instant Insights and Scalable Querying
Favicon
Coding challenge: Design and Implement an Advanced Text Search System
Favicon
tuistash: A Terminal User Interface for Logstash
Favicon
Navigating Search Solutions: A Comprehensive Comparison Guide to Meilisearch, Algolia, and ElasticSearch
Favicon
Elastic Cloud on Kubernetes (ECK) with custom domain name
Favicon
Step-by-Step Guide to Configuring Cribl and Grafana for Data Processing
Favicon
Exploring Logging Best Practices
Favicon
Building a Smart Log Pipeline: Syslog Parsing, Data Enrichment, and Analytics with Logstash, Elasticsearch, and Ruby
Favicon
How to connect to AWS OpenSearch or Elasticsearch clusters using python
Favicon
Elasticsearch Was Great, But Vector Databases Are the Future
Favicon
Building Real-Time Data Pipelines with Debezium and Kafka: A Practical Guide
Favicon
AI + Search + Real Time Data = ๐Ÿ”ฅ (๐’ฎ๐‘’๐’ถ๐“‡๐’ธ๐’ฝ ๐“Œ๐’พ๐“๐“ ๐’ท๐‘’ ๐“‰๐’ฝ๐‘’ ๐’ป๐“Š๐“‰๐“Š๐“‡๐‘’ ๐‘œ๐’ป ๐’œ๐ผ)
Favicon
Size Doesn't Matter: Why Your Elasticsearch Fields Need to Stop Caring About Length
Favicon
ELK Stack Mastery: Building a Scalable Log Management System
Favicon
Elastop: An HTOP Inspired Elasticsearch Monitoring Tool
Favicon
Hybrid Search with Elasticsearch in .NET
Favicon
Proximity Search: A Complete Guide for Developers
Favicon
How I can run elasticsearch locally for development using docker?
Favicon
Improving search experience using Elasticsearch
Favicon
How to integrate Elasticsearch in Express
Favicon
Advanced Techniques for Search Indexing with Go: Implementing Full-Text Search for Product Catalogs
Favicon
Semantic Search with Elasticsearch in .NET
Favicon
15 WordPress Search Plugins to Supercharge Your Websiteโ€™s Search Functionality
Favicon
Building a Web Search Engine in Go with Elasticsearch
Favicon
github action services: mysql, redis and elasticsearch

Featured ones: