Logo

dev-resources.site

for different kinds of informations.

#SemanticKernel: Local LLMs Unleashed on #RaspberryPi 5

Published at
5/9/2024
Categories
englishpost
codesample
llama3
Author
elbruno
Categories
3 categories in total
englishpost
open
codesample
open
llama3
open
Author
7 person written this
elbruno
open
#SemanticKernel: Local LLMs Unleashed on #RaspberryPi 5

Hi!

Welcome to the exciting world of local Large Language Models (LLMs) where we’re pushing the boundaries of what’s possible with AI.

Today let’s talk about a cool topic: run models locally, especially on devices like the Raspberry Pi 5. Let’s dive into the future of AI, right in our own backyards.

Ollama and using Open Source LLMs

OLLAMA stands out as a platform that simplifies the process of running open-source LLMs locally on your machine. It bundles model weights, configuration, and data into a single package, making it accessible for developers and AI enthusiasts alike. The key benefits of using Ollama include:

Using Local LLMs like Llama3 or Phi-3

Local LLMs like Llama3 and Phi-3 represent a significant shift towards more efficient and compact AI models. Llama3, with its Mixture-of-Experts (MoE) architecture, offers specialized neural networks for different tasks, providing high-quality outputs with a smaller parameter count.

Phi-3, developed by Microsoft, uses advanced training techniques like quantization to maximize efficiency, making it ideal for deployment on a wide range of devices.

The use of local LLMs offers several advantages:

How to Set Up a Local Ollama Inference Server on a Raspberry Pi 5

I already wrote a couple of times, my own version of the 1st time setup for a Raspberry Pi (link). Once the device is ready, setting up Ollama on a Raspberry Pi 5 (or older) is a straightforward process. Here’s a quick guide to get you started:

Installation : Use the official Ollama installation script to install it on your Raspberry Pi OS.

The main command is:


curl -fsSL https://ollama.com/install.sh | sh

Enter fullscreen mode Exit fullscreen mode

Running Models : After installation, you can run various LLMs like tinyllama, phi, and llava, depending on your RAM capacity.

In example to install and run llama 3, we can use the following command:


ollama run llama3

Enter fullscreen mode Exit fullscreen mode

Once ollama is installed and a model is downloaded, the console should look similar to this one:

log view of the installation of ollama and the run of llama 3 model

For a detailed step-by-step guide, including setting up Docker and accessing the Ollama WebUI, check out the resources available on GitHub.

Tip: to check the realtime journal of the ollama service, we can run this command:


journalctl -u ollama -f

Enter fullscreen mode Exit fullscreen mode

Important: by defaul ollama server is available only for local calls. In order to enable access from other machines, you need to follow these steps:

– Edit the systemd service by calling this command.


sudo systemctl edit ollama.service

Enter fullscreen mode Exit fullscreen mode

– This will open an editor.

– Add a line Environment under section [Service]:


[Service]
Environment="OLLAMA_HOST=0.0.0.0"

Enter fullscreen mode Exit fullscreen mode

– Save and exit.

– Reload systemd and restart Ollama:

More information in the ollama FAQ.

How to Use Semantic Kernel to Call a Chat Generation from a Remote Server

Let’s switch and write some code. This is a “Hello World” sample using Semantic Kernel and Azure OpenAI Services.

You can learn more about this ai samples in: https://aka.ms/dotnet-ai.

Now , to use a remote LLM, like Llama 3 in a Raspberry Pi, we can add a service to the Builder, that uses OpenAI API specification. In the next sample this change in the line 35:

This makes the trick! And with just a single change in a line.

And we also have the question about the performance, adding a StopWatch we can get a sense of the time elapsed for the call. For this simple call, the response is around 30-50 seconds.

Not bad at all for a small device!

Conclusion

The advent of local LLMs like Ollama is revolutionizing the way we approach AI, offering unprecedented opportunities for innovation and privacy. Whether you’re a seasoned developer or just starting out, the potential of local AI is immense and waiting for you to explore.


This blog post was generated using information from various online resources, including cheatsheet.md, anakin.ai, and techcommunity.microsoft.com, to provide a comprehensive guide on local LLMs and Ollama.

Happy coding!

Greetings

El Bruno


llama3 Article's
30 articles in total
Favicon
Novita AI API on gptel: Supercharge Emacs with LLMs
Favicon
How to Effectively Fine-Tune Llama 3 for Optimal Results?
Favicon
L3 8B Lunaris: Generalist Roleplay Model Merges on Llama-3
Favicon
Accessing Novita AI API through Portkey AI Gateway: A Comprehensive Guide
Favicon
Llama 3 vs Qwen 2: The Best Open Source AI Models of 2024
Favicon
Llama 3.3 vs GPT-4o: Choosing the Right Model
Favicon
Meta's Llama 3.3 70B Instruct: Powering AI Innovation on Novita AI
Favicon
MINDcraft: Unleashing Novita AI LLM API in Minecraft
Favicon
How to Access Llama 3.2: Streamlining Your AI Development Process
Favicon
Are Llama 3.1 Free? A Comprehensive Guide for Developers
Favicon
How Much RAM Memory Does Llama 3.1 70B Use?
Favicon
How to Install Llama-3.3 70B Instruct Locally?
Favicon
Arcee.ai Llama-3.1-SuperNova-Lite is officially the 8-billion parameter model
Favicon
LLM Inference using 100% Modern Java ☕️🔥
Favicon
Enhance Your Projects with Llama 3.1 API Integration
Favicon
Llama 3.2 Running Locally in VSCode: How to Set It Up with CodeGPT and Ollama
Favicon
Llama 3.2 is Revolutionizing AI for Edge and Mobile Devices
Favicon
Two new models: Arcee-Spark and Arcee-Agent
Favicon
How to deploy Llama 3.1 405B in the Cloud?
Favicon
ChatPDFLocal: Chat with Your PDFs Offline with Llama3.1 locally,privately and safely.
Favicon
How to deploy Llama 3.1 in the Cloud: A Comprehensive Guide
Favicon
How to fine tune a model which is available in ollama
Favicon
Theoretical Limits and Scalability of Extra-LLMs: Do You Need Llama 405B
Favicon
Milvus Adventures July 29, 2024
Favicon
Lightning-Fast Code Assistant with Groq in VSCode
Favicon
Journey towards self hosted AI code completion
Favicon
Blossoming Intelligence: How to Run Spring AI Locally with Ollama
Favicon
Setup REST-API service of AI by using Local LLMs with Ollama
Favicon
Hindi-Language AI Chatbot for Enterprises Using Qdrant, MLFlow, and LangChain
Favicon
#SemanticKernel: Local LLMs Unleashed on #RaspberryPi 5

Featured ones: