Logo

dev-resources.site

for different kinds of informations.

Why Every GPU will be Virtually Attached over a Network

Published at
1/8/2025
Categories
cloud
ai
devops
gpu
Author
carl_petey
Categories
4 categories in total
cloud
open
ai
open
devops
open
gpu
open
Author
10 person written this
carl_petey
open
Why Every GPU will be Virtually Attached over a Network

Introducing GPU virtualization

Virtualization is a concept in computer science for creating virtual representations of physical hardware. While virtualization is commonly associated with Virtual Machines (VMs), it extends to other domains, including GPUs. GPU virtualization is essential for efficient resource sharing in high-performance computing, AI, and machine learning. However, it’s often misunderstood, especially when applied to GPUs, where the term can have multiple meanings.

Existing types of GPU Virtualization

GPU virtualization currently exists in three main forms:

  1. Single-node GPU sharing
  2. Dedicated GPU passthrough
  3. Network-based GPU pooling (Thunder Compute’s approach)

The first two operate within a single physical server and are widely used today. Thunder Compute is pioneering the third approach, which operates across multiple servers or ‘nodes’.”

Single-node GPU sharing (e.g., NVIDIA vGPU)

Image description

Divides a physical GPU into multiple virtual GPUs. This allows several virtual machines (VMs) to simultaneously use portions of the same GPU, improving resource utilization in scenarios where VMs don’t need the full power of a GPU.

Dedicated GPU passthrough (e.g., Intel GVT-d)

Image description

Assigns an entire physical GPU to a single VM. While this doesn’t split the GPU, it’s considered virtualization because it allows a VM to directly access the GPU, providing near-native performance for applications that require the full power of a GPU.

The third approach, network-based GPU pooling, is a newer concept that requires deeper explanation.

A new approach: Network-Based Virtualization

Image description

At its core, network-based GPU virtualization solution works by extending physical PCIe connections with virtual connections over a network.

In practice, this means that any computer can access any GPU across a network. Traditionally, adding a GPU to a server requires physically connecting it to the motherboard. With network-based virtualization, a virtual GPU can be “plugged in” via software, behaving just like a physically connected GPU.

This solution acts as a bridge between the application and the GPU. It replaces the standard GPU software interface (like NVIDIA CUDA) with a network-aware version. This allows applications to interact with GPUs on remote servers as if they were locally attached.

The end result is that a computer without a physical GPU can behave exactly as if it has a GPU, without any hardware changes. This creates a flexible, distributed GPU resource pool that can be dynamically allocated and shared across the network.

Why Network-Distributed GPU Virtualization is a Game-Changer

Traditional GPU virtualization is limited by physical hardware constraints, typically supporting a maximum of 8 GPUs per server. Expanding GPU capacity requires vertical scaling, which involves upgrading individual servers. However, this method often leads to inefficient resource utilization as VMs tend to reserve entire GPUs.

A network-distributed approach overcomes these limitations by enabling GPUs to be accessed across multiple servers (also called ‘nodes’) in a data center. This creates a data center-wide pool of GPU resources, rather than limiting each server to its own physically attached GPUs.

This ability to expand GPU resources by adding more servers (known as horizontal scalability) allows for flexible, on-demand allocation of GPU power. It dramatically increases efficiency by ensuring GPUs are used to their full capacity across the entire data center.

Comparing Network-based virtualization to Similar Technologies

To conceptualize network virtualization, it is helpful to look at some existing solutions for attaching GPUs and other hardware across networks:

  • NVIDIA InfiniBand: This is a high-speed networking technology that allows for faster communication between servers in a data center. While it improves the connection speed for GPU systems spread across multiple servers, it doesn’t address the core issue of efficiently allocating GPU resources among different applications or users.
  • Storage Area Networks (SANs): SANs pool storage devices across a network, allowing VMs to access only the storage they need without reserving excess capacity. Thunder Compute’s GPU virtualization operates on a similar principle, enabling precise GPU resource allocation with minimal idle time.

The Future of GPU Virtualization

As with other virtualization technologies, network-based GPU virtualization faces performance challenges but continues to improve. Early tests from Thunder Compute, the startup building this technology, showed AI inference tasks running 100 times slower than on attached hardware. Within a month, performance improved to ~2 times slower for most AI workloads.

This rapid progress points to a future where network-virtualized GPUs will match the performance of physically attached GPUs. As the technology matures, applications will extend beyond data centers to slower networks, including connections between data centers and even home networks. We envision a future where developers can access vast GPU resources from their laptops over standard WiFi connections.

gpu Article's
30 articles in total
Favicon
A Practical Look at NVIDIA Blackwell Architecture for AI Applications
Favicon
Accelerating Python with Numba - Introduction to GPU Programming
Favicon
Why Every GPU will be Virtually Attached over a Network
Favicon
Optimize Your PC Performance with Bottleneck Calculator
Favicon
Understanding NVIDIA GPUs for AI and Deep Learning
Favicon
BlockDag - Bitcoin Mining Rig
Favicon
Hopper Architecture for Deep Learning and AI
Favicon
Glows.ai: Redefining AI Computation with Heterogeneous Computing
Favicon
Older NVIDIA GPUs that you can use for AI and Deep Learning experiments
Favicon
NVIDIA Ada Lovelace architecture for AI and Deep Learning
Favicon
NVIDIA GPUs for AI and Deep Learning inference workloads
Favicon
Neurolov.ai - The Future of Distributed GPUs in AI Development
Favicon
The most powerful NVIDIA datacenter GPUs and Superchips
Favicon
Why Loading llama-70b is Slow: A Comprehensive Guide to Optimization
Favicon
What to Expect in 2025: The Hybrid Cloud Market in Israel
Favicon
"Learn HPC with me" kickoff
Favicon
GpuScript: C# is no longer just for the CPU.
Favicon
NVIDIA Ampere Architecture for Deep Learning and AI
Favicon
InstaMesh: Transforming Still Images into Dynamic Videos
Favicon
CPUs, GPUs, TPUs, DPUs, why?
Favicon
Why you shouldn't Train your LLM from Scratch
Favicon
How to deploy SmolLM2 1.7B on a Virtual Machine in the Cloud with Ollama?
Favicon
Rent Out Your Idle GPUs and Earn on Dataoorts
Favicon
How to deploy Solar Pro 22B in the Cloud?
Favicon
Unveiling GPU Cloud Economics: The Concealed Truth
Favicon
How I built a cheap AI and Deep Learning Workstation quickly
Favicon
NVIDIA GPUs with 12 GB of video memory
Favicon
NVIDIA GPUs with 16 GB of Video RAM
Favicon
Nvidia GPUs with 48 GB Video RAM
Favicon
Affordable GPUs for Deep Learning: Top Choices for Budget-Conscious Developers

Featured ones: