Logo

dev-resources.site

for different kinds of informations.

Setup REST-API service of AI by using Local LLMs with Ollama

Published at
5/9/2024
Categories
ai
fastapi
ollama
llama3
Author
evolvedev
Categories
4 categories in total
ai
open
fastapi
open
ollama
open
llama3
open
Author
9 person written this
evolvedev
open
Setup REST-API service of AI by using Local LLMs with Ollama

Setting up a REST API service for AI using Local LLMs with Ollama seems like a practical approach. Here’s a simple workflow.

1 Install Ollama and LLMs

Begin by installing Ollama and the Local LLMs on your local machine. Ollama facilitates the local deployment of LLMs, making it easier to manage and utilize them for various tasks.

Install Ollama

Install LLMs for Ollama

ollama pull llama3
ollama run llama3
Enter fullscreen mode Exit fullscreen mode

Ollama Commands

Available Commands:
  /set         Set session variables
  /show        Show model information
  /bye         Exit
  /?, /help    Help for a command

Use """ to begin a multi-line message
Enter fullscreen mode Exit fullscreen mode

Test Ollama

curl http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt": "Why is the sky blue?",
  "stream": true
}'
Enter fullscreen mode Exit fullscreen mode

If stream is set to false, the response will be a single JSON object

curl http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt": "Why is the sky blue?",
  "stream": false
}'
Enter fullscreen mode Exit fullscreen mode

2 Set Up FastAPI:

Set up a Python FastAPI application. FastAPI is a modern, fast (high-performance) web framework for building APIs with Python 3.7+ based on standard Python type hints. It’s a great choice for building robust and efficient APIs.

Develop the FastAPI routes and endpoints to interact with the Ollama server. This involves sending requests to Ollama for processing tasks such as text generation, language understanding, or any other AI-related tasks supported by your LLMs. Following codes is a simple example. (You also can use Ollama Python lib to improve following coding.)

from typing import Union
from fastapi import FastAPI
from pydantic import BaseModel
import json
import requests

app = FastAPI(debug=True)

class Itemexample(BaseModel):
    name: str
    prompt: str
    instruction: str
    is_offer: Union[bool, None] = None

class Item(BaseModel):
    model: str
    prompt: str

urls =["http://localhost:11434/api/generate"]

headers = {
    "Content-Type": "application/json"
}

@app.get("/")
def read_root():
    return {"Hello": "World"}

@app.post("/chat/{llms_name}")
def update_item(llms_name: str, item: Item):
    if llms_name == "llama3":
        url = urls[0]
        payload = {
            "model": "llama3",
            "prompt": "Why is the sky blue?",
            "stream": False
        }
        response = requests.post(url, headers=headers, data=json.dumps(payload))
        if response.status_code == 200:
            return {"data": response.text, "llms_name": llms_name}
        else:
            print("error:", response.status_code, response.text)
            return {"item_name": item.model, "error": response.status_code, "data": response.text}
    return {"item_name": item.model, "llms_name": llms_name}
Enter fullscreen mode Exit fullscreen mode

Test REST-API service

curl --location 'http://127.0.0.1:8000/chat/llama3' \
--header 'Content-Type: application/json' \
--data '{
  "model": "llama3",
  "prompt": "Why is the sky blue?"
}'
Enter fullscreen mode Exit fullscreen mode

3. Deploy:

Once you’re satisfied with the functionality and performance of REST API, this service can be deployed to a production environment if needed. This might involve deploying it to a cloud platform, containerizing it with Docker, or deploying it on a server.

In this simple example, by leveraging Ollama for local LLM deployment and integrating it with FastAPI for building the REST API server, you’re creating a free solution for AI services. This model can be fine-tuning by your own training data for customized purpose (we will discuss in future).

Happy reading, happy coding.

References:

  1. Ollama

  2. Ollama GitHub

  3. Ollama Python Lib

  4. FASTAPI

  5. Ollama Installation Guide

llama3 Article's
30 articles in total
Favicon
Novita AI API on gptel: Supercharge Emacs with LLMs
Favicon
How to Effectively Fine-Tune Llama 3 for Optimal Results?
Favicon
L3 8B Lunaris: Generalist Roleplay Model Merges on Llama-3
Favicon
Accessing Novita AI API through Portkey AI Gateway: A Comprehensive Guide
Favicon
Llama 3 vs Qwen 2: The Best Open Source AI Models of 2024
Favicon
Llama 3.3 vs GPT-4o: Choosing the Right Model
Favicon
Meta's Llama 3.3 70B Instruct: Powering AI Innovation on Novita AI
Favicon
MINDcraft: Unleashing Novita AI LLM API in Minecraft
Favicon
How to Access Llama 3.2: Streamlining Your AI Development Process
Favicon
Are Llama 3.1 Free? A Comprehensive Guide for Developers
Favicon
How Much RAM Memory Does Llama 3.1 70B Use?
Favicon
How to Install Llama-3.3 70B Instruct Locally?
Favicon
Arcee.ai Llama-3.1-SuperNova-Lite is officially the 8-billion parameter model
Favicon
LLM Inference using 100% Modern Java ☕️🔥
Favicon
Enhance Your Projects with Llama 3.1 API Integration
Favicon
Llama 3.2 Running Locally in VSCode: How to Set It Up with CodeGPT and Ollama
Favicon
Llama 3.2 is Revolutionizing AI for Edge and Mobile Devices
Favicon
Two new models: Arcee-Spark and Arcee-Agent
Favicon
How to deploy Llama 3.1 405B in the Cloud?
Favicon
ChatPDFLocal: Chat with Your PDFs Offline with Llama3.1 locally,privately and safely.
Favicon
How to deploy Llama 3.1 in the Cloud: A Comprehensive Guide
Favicon
How to fine tune a model which is available in ollama
Favicon
Theoretical Limits and Scalability of Extra-LLMs: Do You Need Llama 405B
Favicon
Milvus Adventures July 29, 2024
Favicon
Lightning-Fast Code Assistant with Groq in VSCode
Favicon
Journey towards self hosted AI code completion
Favicon
Blossoming Intelligence: How to Run Spring AI Locally with Ollama
Favicon
Setup REST-API service of AI by using Local LLMs with Ollama
Favicon
Hindi-Language AI Chatbot for Enterprises Using Qdrant, MLFlow, and LangChain
Favicon
#SemanticKernel: Local LLMs Unleashed on #RaspberryPi 5

Featured ones: