Logo

dev-resources.site

for different kinds of informations.

CPUs, GPUs, TPUs, DPUs, why?

Published at
12/2/2024
Categories
cpu
gpu
deeplearning
computerscience
Author
7oby
Author
4 person written this
7oby
open
CPUs, GPUs, TPUs, DPUs, why?

For me (I mean at that time), things were as simple as this: There would be only two types of Processing Units: GPU and CPU. CPUs are the standard and GPUs are exclusively used for everything graphics related (like video games).

But then I slid into the Machine learning field and started hearing my pairs talking about TPU and sometimes about a mysterious DPU. Sometimes they are even struggling deciding which one to use to train their Machine learning models.

Which ones are those two again? And how are they different from the other ones.
I had to find answers to those questions. So I did some googling on the matter and found out MY WHOLE LIFE WAS A LIE.

CPUs, GPUs, TPUs, DPUs, none of those is technically better than the other. But simply each of them is built or optimized for a particular purpose which it excels at, compared to the rest.

So what is the purpose of having a CPU?

First you have to understand that Processing Units have what we call cores and each of those cores can perform only one task at the same time. So the more a PU has cores, the more tasks it is able to perform in parallel (i.e at the same time).

From far, CPUs have always been the best suited for the most logic heavy computer operations you can think of. That's because the high performance cores of a CPU are optimized for sequential (step by step) processing and low-latency branching.
So CPUs are the best option when it comes to complex operations like Database Query Execution, Operating System Management, Compiling Code, Encryption and Decryption, Data Serialization/Deserialization, File System Operations, Real-Time System Operations, Error Detection and Correction, Simulation of Rule-Based Systems, etc., or any task requiring extensive if-else or switch-case branching logic, which must be executed step-by-step.

The annoying thing about sequential processing though is that it can take time, so much time… especially for massive tasks like graphics rendering.
Imagine having to rely on a few CPU’s cores for a high graphics video game like Cyberpunk 2077. It would lag so much that you’d never want to play a video game in your life, lol.
Around millions of CPUs cores would be needed to render the graphics of such a game in real time avoiding any lagging and achieving an ideal User Experience.

“Why don't we add more cores to the CPU then?” You may ask.
Well, like we (implicitly) stated earlier, the cores of a CPU are very large and resource-intensive having quite a complex architecture and making the amount that can fit on a chip very limited (typically 4–32). Scaling such cores would increase power consumption and heat generation, leading to diminishing returns.

But there is good news! Many computer operations like graphics rendering involve tasks that are highly repetitive and can be broken down into many smaller, independent operations that do not require a lot of branching or conditional logic.

If we cannot scale the number of the large cores, How about building tinier cores of simpler architecture so that we could gather thousands of them to complete repetitive tasks in parallel? That would solve many problems like the graphics rendering one. Yes, and we would call that the Graphic Processing Unit.

The need of a Graphic Processing Unit

Enter the Graphics Processing Unit (GPU), a chip built for parallel computing. While CPUs may have up to 24 cores, modern GPUs like NVIDIA's RTX 4080 boast nearly 10,000 cores. Each core can perform a computation simultaneously.
And while they were originally built to solve the graphics rendering problem (calculating lights, shadows, and textures in real-time), the massive amounts of cores of GPUs and their parallelism make them an option for other tasks like
AI and deep learning (performing massive matrix multiplications for training models)

Then Why the TPU?

The Tensor Processing Unit (TPU) is a specialized chip designed for deep learning. Created by Google in 2016, TPUs focus on tensor operations (like matrix multiplication) required for training AI models. Unlike GPUs, TPUs eliminate the need to access shared memory or registers, making them far more efficient for neural network training.

If you’re working with massive datasets or training large models, TPUs can save you significant time and resources.

DPUs: The Data Center Workhorse

The Data Processing Unit (DPU) is the newest member of the computing family, optimized for data-intensive tasks in cloud environments. These chips handle:

  • Networking: Packet processing, routing, and security.
  • Data storage: Compression and encryption.

DPUs offload these tasks from CPUs, allowing the CPU to focus on general-purpose computing. While you won’t find a DPU in your laptop, they are essential for modern data centers where efficiency is paramount.

That's it!

If you didn't, now you know the difference between those proccessing units.CPUs remain the backbone of general-purpose computing, excelling in logic-heavy, sequential tasks. GPUs, with their massively parallel architecture, dominate in rendering, machine learning, and any workload requiring high-throughput computations. TPUs are purpose-built for deep learning, offering unmatched efficiency in training and inference for neural networks. DPUs, on the other hand, cater to data-centric operations, offloading tasks like networking and storage from CPUs to boost efficiency in large-scale data centers.

Next article,

on this subject will be for machine learning engineers, where we will discuss how to choose the right processing unit for your AI workload. We'll explore the roles of GPUs, TPUs, and CPUs in training and inference, compare their performance, cost, and scalability, and dive into practical tips for optimizing your machine learning pipelines for maximum efficiency.
Keep in touch!

cpu Article's
30 articles in total
Favicon
Understanding the Difference Between x86 and ARM CPUs: Instruction Set Comparison and Their Impact
Favicon
Understanding the Essential Elements of a Well-Designed CISC Architecture for Modern Computing
Favicon
🎯 Run Qwen2-VL on CPU Using GGUF model & llama.cpp
Favicon
Understanding CPU Performance: A Detailed Guide to Comparing Processors for Optimal Computing
Favicon
Optimize Your PC Performance with Bottleneck Calculator
Favicon
How to Choose the Right CPU for Your Desktop Computer
Favicon
Every Programmer Should Know These CPU Tricks for Maximum Efficiency
Favicon
Why is CPU usage more than 100% in Mac activity monitor?
Favicon
Profiling no Java: Guia prático para analisar o desempenho de aplicações Java
Favicon
Understanding the Key Differences Between NMI and Normal Interrupts
Favicon
The Benefits of Big-Endian CPU Architecture Over Little-Endian Systems
Favicon
CPUs, GPUs, TPUs, DPUs, why?
Favicon
How to Design a CPU from Scratch
Favicon
The Complete Guide to CPU Architecture
Favicon
How Arm’s Success in Data Centers is Shaping the Future of Chip Technology
Favicon
The Future of Adaptive Computing: How VLIW, Code Morphing, and AI Could Redefine CPUs and Software
Favicon
How I Set Up My Own Server (and Why You Should Too)
Favicon
6502 Assembly: Calculating Code Performance
Favicon
Kubectl Top command:-Secrets behind scenes
Favicon
"What Every Programmer Should Know About Memory" by Ulrich Drepper.
Favicon
Optimizing Your Development Machine: How Many Cores and Threads Do You Need for Programming?
Favicon
6502 Assembly - Intro
Favicon
Vendor lock-in when using AWS Graviton processors is no longer a real thing
Favicon
Understanding Privileged Instructions in x86 Architecture
Favicon
The Role of the CPU in Interpreting Machine Code: How it Powers Modern Computing
Favicon
What are CPU registers
Favicon
Understanding the Benefits of Multi-Core CPUs in Modern Computing
Favicon
Understanding CPU Performance Metrics
Favicon
How Reliable are Modern CPUs in Predicting Branches?
Favicon
The Purpose of Computer Processors (CPUs) and How Multiple Cores Improve Speed and Performance

Featured ones: