Logo

dev-resources.site

for different kinds of informations.

Talk with your PDF documents in SharePoint

Published at
12/6/2024
Categories
rag
llm
langchain
sharepoint
Author
vincent_cys
Categories
4 categories in total
rag
open
llm
open
langchain
open
sharepoint
open
Author
11 person written this
vincent_cys
open
Talk with your PDF documents in SharePoint

A dreadful Teams/Slack message popped up! “Hey, could you help to find out [information] is in which documents?” You opened up the SharePoint folder, only to find out that you have no idea which documents this information belongs to.

Fear not! In this article, we will be building a RAG application to search through the mountain of PDF documents in your SharePoint.

RAG app: https://finance-chatbot-vincent-cheng.streamlit.app/

RAG app preview

Tech Stack

  1. Database: ChromaDB
  2. LLM and model: OpenAI’s gpt-4o-mini, Google’s Gemini 1.5 Flash-8B
  3. Text embeddings: OpenAI’s text-embedding-3-large, Google’s embedding-001
  4. FrontEnd: Streamlit
  5. Cloud: Streamlit community cloud
  6. Tools: LangChain
  7. Storage: Microsoft SharePoint

Architecture Overview

Architecture Overview

Github: https://github.com/cyshen11/finance-chatbot/tree/main

Index

For index, we are converting the PDF documents into vector embeddings and store in a vector database.

Given that your documents are in the SharePoint, we can load the documents using LangChain SharePointLoader. Before using the SharePointLoader, we need to obtain a few parameters O365_CLIENT_ID, O365_CLIENT_SECRET, O365_TOKEN, DOCUMENT_LIBRARY_ID and FOLDER_ID. You can follow this guide on how to obtain these parameters. For the O365_TOKEN, convert the content in o365_token.txt into TOML format. Copy the output and paste into your Streamlit secrets in this format.

[O365_TOKEN]
token_type = ...
scope = ...
expires_in = ...
...
Enter fullscreen mode Exit fullscreen mode

In the Python code, read this secrets, convert into JSON, write the JSON into this directory Path.home() / ".credentials" . Then, you can initialize the SharePointLoader with the token and load the documents.

 directory_path = Path.home() / ".credentials"

 # Check if dir exist
  if not os.path.exists(directory_path):
    os.makedirs(directory_path)

  # Write O365 token into text file 
  with open(directory_path / "o365_token.txt", 'w') as f:
    json.dump(O365_TOKEN, f)

  # Initialize document loader
  loader = SharePointLoader(
    document_library_id=document_library_id, 
    auth_with_token=True,
    folder_id=folder_id
  )
Enter fullscreen mode Exit fullscreen mode

Load the documents using the SharePointLoader. Before initializing the vector database, obtain the API keys for the LLM model that you are going to use. Initialize vector database (ChromaDB) and specify the collection name, embeddings based on user selected model. Provide the directory to the persist_directory parameter to save the vector database on-disk. Add the loaded documents into the vector database with generated ids.

Retrieval

When we submit the question at the app, the RAG will convert the question into embeddings, perform vector search to return top K documents (n-nearest neighbors) based on vector similarity.

Generation

The RAG then passes the documents as context and user question to the LLM for generating a response. We will also retrieve the source, page from the documents and de-duplicate them. Finally, the response, source and page are passed back to the front-end.

Result

Result

Tada! We found the documents!

sharepoint Article's
30 articles in total
Favicon
Panic Button: My SharePoint File Vanished (But Don't Worry, It's Not Gone Forever)
Favicon
Building Developer Communities: The Key to Growing and Nurturing a Web Development Tribe
Favicon
🚀 Calling All Innovators and Developers!
Favicon
Examining the Elements That Influence Bitcoin's Price Variations
Favicon
Looking Back: 2024 in Review
Favicon
navidakir
Favicon
Contract Management Software on Microsoft 365 SharePoint Online
Favicon
🌟 Welcome to Forem: The Ultimate Platform for Building Thriving Communities!🌐
Favicon
AI in Academia and Industry Collaboration
Favicon
A Guide to SharePoint Embedded Authentication with Examples
Favicon
Strategies to Securing Sensitive Documents in SharePoint
Favicon
Benefits of Hiring a SharePoint Development Company for Your Business
Favicon
Talk with your PDF documents in SharePoint
Favicon
Your First Steps with SharePoint Agents
Favicon
AI Innovations at Microsoft Ignite 2024 What You Need to Know
Favicon
Stop Sharing 3D Assets Over Email and Shared Drives & Start Using A 3D DAM
Favicon
7 SharePoint Embedded Tips for Developers and Admins
Favicon
The Importance of Version Control: Mastering Git and GitHub for Collaboration
Favicon
NGS Solution - Top software development company in USA
Favicon
Boost Productivity 300%! Integrate Salesforce with SharePoint in 7 Steps
Favicon
Introducing🥁 my CLI based NUMERICAL SYSTEM CALCULATOR and SCRIPT DETECTOR
Favicon
Updating SharePoint items without modifying System columns
Favicon
Power Up Your SharePoint Embedded Solutions with the Starter Kit
Favicon
How to Recovery SharePoint Database?
Favicon
How to Recover SharePoint Database Easily?
Favicon
Automating SharePoint Embedded: Using PowerShell to Call Graph API Endpoints
Favicon
Essential PowerShell Commands for SharePoint Embedded Management
Favicon
How to Attract Clients by Offering Value-First Website Development
Favicon
TrueGigs Staffing Solutions
Favicon
The Power of Collaboration: Lessons from a Felt World

Featured ones: