Logo

dev-resources.site

for different kinds of informations.

Local Intelligence: How to set up a local GPT Chat for secure & private document analysis workflow

Published at
6/3/2024
Categories
ai
llm
chat
rag
Author
avatsaev
Categories
4 categories in total
ai
open
llm
open
chat
open
rag
open
Author
8 person written this
avatsaev
open
Local Intelligence: How to set up a local GPT Chat for secure & private document analysis workflow

Intro

In this article, I'll walk you through the process of installing and configuring an Open Weights LLM (Large Language Model) locally such as Mistral or Llama3, equipped with a user-friendly interface for analysing your documents using RAG (Retrieval Augmented Generation). This setup allows you to analyse your documents without sharing your private and sensitive data with third-party AI providers such as OpenAI, Microsoft, Google, etc.

Prerequisites

  • You can use pretty much any machine you want, but it's preferable to use a machine a dedicated GPU or Apple Silicon (M1,M2,M3, etc) for faster inference.
  • Docker must be preinstalled

Installation

Ollama

Image description


Ollama is a service that allows us to easily manage and run local open weights models such as Mistral, Llama3 and more (see the full list of available models).
Ollama installation is pretty straight forward just download it from the official website and run Ollama, no need to do anything else besides the installation and starting the Ollama service.

Installing Ollama User Interface

Image description


Next step is installing the Ollama User Interface that will run on Docker, so Docker must be installed and running before installing the Ollama UI.

To install the UI simply run the following command in the terminal:

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name ollama-webui --restart always -e WEBUI_AUTH=false ghcr.io/open-webui/open-webui:main
Enter fullscreen mode Exit fullscreen mode

This will install and start the Ollama UI webserver locally on address http://localhost:3000/

Download a local model

Now that everything is up and running, we need to download a model.

Good general purpose models as of today (May 2024) are Llama3 (from Meta) and Mistral, in this article, I'll show how to install Mistral Instruct.

Go to Ollama library: https://ollama.com/library and type "mistral" in the search bar, then click on the first result:


Image description


Pick the instruct variant in the dropdown menu:


Image description


And copy the name and the tag of the model from the right side (don't copy the entire command just the model_name:tag part):


Image description


In the Ollama UI, click on the username, the bottom left corner, to display the pop over menu, click on "Settings":


Image description


Then click on "Models" on the sidebar. This form below allows us to download any model that Ollama supports

Paste the model tag mistral:instruct in the text field and click download:


Image description

--

The model installation is the same for any other models in the Ollama Library

Chat with the model

Once the model is downloaded, you can select it and set it as default:


Image description


Image description


Let's see if everything works by sending a message to the model:


Image description


Great! The model is loaded and running without any issues 🎉🥳

Now we can do some interesting things with it.

Analyse documents and data - RAG (Retrieval Augmented Generation)

You can upload documents and ask questions related to these documents, not only that, you can also provide a publicly accessible Web URL and ask the model questions about the contents of the URL (an online documentation for example). All files you add to the chat will always remain on your machine and won't be sent to the cloud.

Working with a PDF document example

Click the "+" icon in the chat and pick any PDF document you want:


Image description


I've uploaded the "Attention All you need" paper as a PDF document, and asked a specific question related to this document:

"What is the purpose of multi head attention mechanism?"


Image description


Let's check if the RAG worked correctly by looking into the original PDF document:


Image description


The RAG system was able to pinpoint the relevant part of the paper in order to answer the question 🎉

Ask questions about the contents of a Web Page

The URL of the web page must be publicly accessible, if you need to authenticate in order to view the page, the RAG won't work, so if you need to analyse a web page protected by auth, a workaround would be to first download it as PDF and upload it as a simple document.

In the chat field type # followed by a URL, for this example I'll use Doctolib's FAQ about handling relatives in your Doctolib account:


Image description


Image description


Image description


Saving the documents to your Workspace

You can also save your most often used documents in your workspace so you don't have to upload them every time, for that, click on "Workspace". go "Documents" tab, and upload your files here:

Image description

Later when you want to work with your documents, just go to chat, and type # in the message fields, you'll be presented with all documents from your work space, you can chose to work with one specific document or all of them in a single chat session:

Image description


This is just scratching the surface, the Ollama UI can be configured to make the retrieval even more performant with some tricks. If you're interested in advanced configuration and usage of this workflow let me know in the comments.


chat Article's
30 articles in total
Favicon
FireChat User Guide
Favicon
How to get a right peoples in your chat group?
Favicon
How Sportsbet handles 4.5M daily chat messages on its 'Bet With Mates' platform
Favicon
Diary App, diaries and messaging APIs
Favicon
MY CHAT WITH ChatGPT
Favicon
Top Chat APIs to Integrate into Your Apps [2024]
Favicon
Chat APIs vs. Chat SDKs: A Comprehensive Guide
Favicon
How AI Chatbots Are Changing Customer Service Efficiency
Favicon
Build Interactive In-App Polls With SwiftUI
Favicon
Talk to Strangers: Exploring the World of Online Conversations
Favicon
Local Intelligence: How to set up a local GPT Chat for secure & private document analysis workflow
Favicon
Weaving the Web of Conversation: Implementing Chat Functionality in Your Web App
Favicon
Integra múltiples APIs de IA en una sola plataforma
Favicon
Mobil Sohbet Siteleri
Favicon
Amazon IVS Live Stream Playback with Chat Replay using the Sync Time API
Favicon
Getting Started with Tinode: An Open-Source Messaging Platform
Favicon
Chat GPT Español Gratis: Transformando la Conversación Digital
Favicon
A Guide On MVP Development For Chat Apps!
Favicon
Testfälle für die Chat-Anwendung
Favicon
Cas de test pour l'application de chat
Favicon
チャットアプリケーションのテストケース
Favicon
Testfälle für die Chat-Anwendung
Favicon
Przypadki testowe dla aplikacji czatu
Favicon
Test Cases for Chat Application
Favicon
Implement G-suit Chat Service in .NET Using Google Chat API
Favicon
Real-time chat application with advanced functionality built using the MERN stack
Favicon
A Real time Chat Application with MERN
Favicon
A Full Stack Chatting App using Socket.io
Favicon
An extensible high-performance chatbot framework using Next.js
Favicon
Video chat with Matrix-engine [standalone] raw yt video

Featured ones: