Logo

dev-resources.site

for different kinds of informations.

RAG observability in 2 lines of code with Llama Index & Langfuse

Published at
3/18/2024
Categories
observability
llamaindex
rag
opensource
Author
clemra
Author
6 person written this
clemra
open
RAG observability in 2 lines of code with Llama Index & Langfuse

Why you need observability for RAG

There are so many different ways to make RAG work for a use case. What vector store to use? What retrieval strategy to use? LlamaIndex makes it easy to try many of them without having to deal with the complexity of integrations, prompts and memory all at once.

Initially, we at Langfuse worked on complex RAG/agent applications and quickly realized that there is a new need for observability and experimentation to tweak and iterate on the details. In the end, these details matter to get from something cool to an actually reliable RAG application that is safe for users and customers. Think of this: if there is a user session of interest in your production RAG application, how can you quickly see whether the retrieved context for that session was actually relevant or the LLM response was on point?

What is Langfuse?

Thus, we started working on Langfuse.com (GitHub) to establish an open source LLM engineering platform with tightly integrated features for tracing, prompt management, and evaluation. In the beginning we just solved our own and our friends’ problems. Today we are at over 1000 projects which rely on Langfuse, and 2.3k stars on GitHub. You can either self-host Langfuse or use the cloud instance maintained by us.

We are thrilled to announce our new integration with LlamaIndex today. This feature was highly requested by our community and aligns with our project's focus on native integration with major application frameworks. Thank you to everyone who contributed and tested it during the beta phase!

Integrating Llama Index with Langfuse for open source observability and analytics

The challenge

We love LlamaIndex, since the clean and standardized interface abstracts a lot of complexity away. Let’s take this simple example of a VectorStoreIndex and a ChatEngine.

from llama_index.core import SimpleDirectoryReader
from llama_index.core import VectorStoreIndex

documents = SimpleDirectoryReader("./data").load_data()

index = VectorStoreIndex.from_documents(documents)

chat_engine = index.as_chat_engine()

print(chat_engine.chat("What problems can I solve with RAG?"))
print(chat_engine.chat("How do I optimize my RAG application?"))
Enter fullscreen mode Exit fullscreen mode

In just 3 lines we loaded our local documents, added them to an index and initialized a ChatEngine with memory. Subsequently we had a stateful conversation with the chat_engine.

This is awesome to get started, but we quickly run into questions like:

  • “What context is actually retrieved from the index to answer the questions?”
  • “How is chat memory managed?”
  • “Which steps add the most latency to the overall execution? How to optimize it?”

One-click OSS observability to the rescue

We integrated Langfuse to be a one-click integration with LlamaIndex using the global callback manager.

Preparation

  1. Install the community package (pip install llama-index-callbacks-langfuse)
  2. Copy/paste the environment variables from the Langfuse project settings to your Python project: 'LANGFUSE_SECRET_KEY', 'LANGFUSE_PUBLIC_KEY' and 'LANGFUSE_HOST'

Now, you only need to set the global langfuse handler:

from llama_index.core import set_global_handler

set_global_handler("langfuse")
Enter fullscreen mode Exit fullscreen mode

And voilá, with just two lines of code you get detailed traces for all aspects of your RAG application in Langfuse. They automatically include latency and usage/cost breakdowns.

Group multiple chat threads into a session

Working with lots of teams building GenAI/LLM/RAG applications, we’ve continuously added more features that are useful to debug and improve these applications. One example is session tracking for conversational applications to see the traces in context of a full message thread.

To activate it, just add an id that identifies the session as a trace param before calling the chat_engine.

from llama_index.core import global_handler

global_handler.set_trace_params(
    session_id="your-session-id"
)

chat_engine.chat("What did he do growing up?")
chat_engine.chat("What did he do at USC?")
chat_engine.chat("How old is he?")
Enter fullscreen mode Exit fullscreen mode

Thereby you can see all these chat invocations grouped into a session view in Langfuse Tracing:

RAG Observability with Langfuse

Next to sessions, you can also track individual users or add tags and metadata to your Langfuse traces.

Trace more complex applications and use other Langfuse features for prompt management and evaluation

This integration makes it easy to get started with Tracing. If your application ends up growing into using custom logic or other frameworks/packages, all Langfuse integrations are fully interoperable.

We have also built additional features to version control and collaborate on prompts (langfuse prompt management), track experiments, and evaluate production traces. For RAG specifically, we collaborated with the RAGAS team and it’s easy to run their popular eval suite on traces captured with Langfuse (see cookbook).

Get started

The easiest way to get started is to follow the cookbook and check out the docs.

Feedback? Ping us

We’d love to hear any feedback. Come join us on our community discord or add your thoughts to this GitHub thread.

llamaindex Article's
27 articles in total
Favicon
Build Your First AI Application Using LlamaIndex!
Favicon
First step and troubleshooting Docling — RAG with LlamaIndex on my CPU laptop
Favicon
LlamaIndex RAG: Build Efficient GraphRAG Systems
Favicon
RedLM: My submission for the NVIDIA and LlamaIndex Developer Contest
Favicon
Exploring RAG: Discover How LangChain and LlamaIndex Transform LLMs?
Favicon
Building a Multi-Agent Framework from Scratch with LlamaIndex
Favicon
Creating a Simple RAG in Python with AzureOpenAI and LlamaIndex
Favicon
Implementing RAG using LlamaIndex, Pinecone and Langtrace: A Step-by-Step Guide
Favicon
LlamaIndex: Revolutionizing Data Indexing for Large Language Models (Part 1)
Favicon
How to Connect to Milvus Lite Using LangChain and LlamaIndex
Favicon
Choosing Between LlamaIndex and LangChain: A Comprehensive Guide
Favicon
LlamaIndex Framework - Context-Augmented LLM Applications
Favicon
Code the Vote!
Favicon
TypeError: Object of type AgentChatResponse is not JSON serializable
Favicon
Chat with your Github Repo using llama_index and chainlit
Favicon
How to Implement RAG with LlamaIndex, LangChain, and Heroku: A Simple Walkthrough
Favicon
RAG observability in 2 lines of code with Llama Index & Langfuse
Favicon
🚀 🤖 Let's Retrieve Data and Talk: A Full-stack RAG App with Create-Llama and LlamaIndex.TS
Favicon
🤖📚 Take Your First Steps into RAG: Building a LlamaIndex Retrieval Application using OpenAI’s gpt-3.5-turbo
Favicon
Using LlamaIndex for Web Content Indexing and Querying
Favicon
No-code AI: OpenAI MyGPTs, LlamaIndex rags, or LangChain OpenGPTs?
Favicon
GPT-4 Vs Zephyr 7b Beta: Which One Should You Use? 2023
Favicon
Chat with your PDF: Build a PDF Analyst with LlamaIndex and AgentLabs
Favicon
AI-Powered Selection of Asset Management Companies using MindsDB and LlamaIndex
Favicon
My data, your LLM — paranoid analysis of iMessage chats with OpenAI, LlamaIndex & DuckDB
Favicon
LlamaIndex Overview
Favicon
Quick tip: Using SingleStoreDB with LlamaIndex

Featured ones: