dev-resources.site
for different kinds of informations.
A Quick Walkthrough of Semantic Kernel's Kusto Connector for Vector Database Integration
Introduction
The realm of large language models (LLMs) has been significantly advanced by tools like Semantic Kernel, which provides developers with a robust and flexible platform.
As Semantic Kernel is open-sourced, I recently got some time to read its code and found it is using connectors to seamlessly integrate with external services such as OpenAI, Azure AI Search, and various vector databases. Among these connectors, it also offers one connector for Kusto db.
In this post, I'll explore this Kusto connector.
As Semantic Kernel is rapidly evolving, the code examples here, based on its latest 1.0.0 release, might be subject to future changes.
Understanding Kusto Databases
Kusto DB excels in managing and querying large data streams, a capability crucial for handling extensive datasets common in logs and telemetry. If you're new to Kusto DB, a good starting point is the Kusto Query Language Documentation.
Setting Up the Environment
Prerequisites
For this walkthrough, I used a local Jupyter Notebook with a .NET interactive kernel. To replicate this setup, you'll need the following packages:
#r "nuget: Microsoft.SemanticKernel, 1.0.1"
#r "nuget: Microsoft.KernelMemory.Core, 0.25.240103.1"
#r "nuget: Microsoft.SemanticKernel.Connectors.Kusto, 1.0.1-alpha"
using Microsoft.SemanticKernel;
using Microsoft.KernelMemory;
using Microsoft.SemanticKernel.Memory;
using Microsoft.SemanticKernel.Connectors.Kusto;
#r "nuget: Azure.Core, 1.36.0"
#r "nuget: Microsoft.Azure.Kusto.Data, 12.0.0"
using Kusto.Data;
These packages are crucial for integrating Semantic Kernel with Kusto DB, each serving a specific role in the setup.
Establishing the Connection
The first step is connecting to the Kusto DB using a connection string:
var connectionString = new KustoConnectionStringBuilder("<connection string>")
.WithAadUserPromptAuthentication();
Embedding Generator Setup
Next, we set up the embedding generator, essential for creating text embeddings:
var embeddingGenerator = new AzureOpenAITextEmbeddingGenerationService(
"<deployment>",
"<endpoint>",
"<api key>"
);
Configuring Text Memory
To finalize the setup, configure the text memory using the connection string and embedding generator:
#pragma warning disable
KustoMemoryStore memoryStore = new(connectionString, "<kusto db name>");
SemanticTextMemory textMemory = new(memoryStore, embeddingGenerator);
#pragma warning restore
Storing Information
With everything set up, it's time to store some data:
textMemory.SaveInformationAsync("meDef", id: "doc1", text: "My name is Andrea.");
textMemory.SaveInformationAsync("meDef", id: "doc2", text: "I am 30 years old.");
textMemory.SaveInformationAsync("meDef", id: "doc3", text: "I live in South America.");
This step involves creating a new table in the specified Kusto database, if it doesn't already exist. Every SaveInformationAync
run will insert a new record with columns of "Key", "Metadata", "Timestamp" and "Embedding" created by the embedding generator.
Retrieving Answers
To perform a similarity search and get relevant answers, run:
var answer = textMemory.SearchAsync("meDef", "What's my name?");
await foreach (var answer in textMemory.SearchAsync(
collection: "meDef",
query: "What's my name?",
limit: 2,
minRelevanceScore: 0.79,
withEmbeddings: true))
{
Console.WriteLine($"Answer: {answer.Metadata.Text}");
}
The SearchAsync
function internally generates a KQL query that uses functions like series_cosine_similarity_fl
to rank records based on similarity.
Conclusion and Thoughts
- Semantic Kernel is rapidly evolving, and its usage of connectors like Kusto DB might change.
- Kusto DB, while powerful for stream data processing, doesn't specialize in similarity search indexing, potentially slowing down retrieval times.
- As Kusto DB isn't inherently a semantic search tool and lacks chunking functionality, additional tools might be necessary for storing semantic memory effectively.
All comments and insights are welcomed.
Featured ones: