Logo

dev-resources.site

for different kinds of informations.

Nvidia's 1000x Performance Boost Claim Verified

Published at
6/4/2024
Categories
ai
machinelearning
marketing
hardware
Author
maximsaplin
Author
11 person written this
maximsaplin
open
Nvidia's 1000x Performance Boost Claim Verified

Nvidia's keynote at the recent Computex was full of bold marketing and messaging, bordering on complete BS.

CEO Math Lesson

The "CEO Math" lesson with the "The more you buy, the more you save" conclusion has reminded me of another bold claim (and play with the numbers) from earlier this year.

At Blackwell's intro, one of the slides stated there's a 1000x boost in the compute power of Nvidia GPUs. Though many noticed the comparison was not apples-to-apples: FP16 data type performance for older generations was compared against FP8 and FP4 smaller data types introduced in the newer hardware. Apparently, lower precision computation is faster. The graph would be much nicer if the FP16 line continued. Like that:

Blackwell FP16 performance

It is great that the new hardware has acceleration for smaller data types. It follows the trend of quantized language models - trading off slight LLM performance degradation for smaller size and faster inference. Though presenting the figures in the way they were presented:

  • not explaining the difference in datatypes,
  • hiding the baseline and breaking consistency
  • not highlighting the downside of decreased precision...

... that seems like a sketchy move worth of "How to Lie with Statistics" book.

How to Lie with Statistics

Anyways... To come up with the above numbers for the FP16 performance for Hopper and Blackwell I found the specs for the products that had 4000 TFLOPS FP8 and 20000 TFLOPS FP4.

They are:

  • H100 SXM FP8 3,958 teraFLOPS and FP16 1,979 teraFLOPS

H100 SXM

  • GB200 NVL2 dual GPU system with FP4 40 PFLOPS and FP16 10 PFLOPS (5000 FP16 teraFLOPS per GPU)

GB200 NVL2

The improvement in performance is still impressive, yet 1000x is way nicer than a mere 263x ;)

hardware Article's
30 articles in total
Favicon
AI in Your Hands: Nvidia’s $3,000 Supercomputer Changes Everything
Favicon
Exploring Embedded System Development: How to Choose and Optimize Single Board Computers and Development Boards
Favicon
Why Is My Printer Offline? Easy Steps to Fix It
Favicon
Rust on a $2 dev board
Favicon
Curso De Fundamentos Do Hardware Gratuito Com Certificado Da Cisco
Favicon
Hardware and PCB design considerations for ESP8266 based programmable controller.
Favicon
The Power of Memory Map
Favicon
Extending NUC with External U.2 SSD
Favicon
I'm building robots
Favicon
ARM vs x86, Which works best for you!
Favicon
Which Operating System Offers Better Hardware Compatibility: Windows or Linux?
Favicon
Curso De Suporte De TI Online E Gratuito Da JA Brasil
Favicon
Tinkerforge Weather Station, part 3 - Continuing the project after a decade
Favicon
The Evolution of Hardware vs. Software Speed: Analyzing the Growing Gap Over Time
Favicon
USB HID Down the rabbit hole: Logitech G435 dongle
Favicon
What are ASIC cooling systems?
Favicon
Software OR Hardware Raid: What's Better In 2024?
Favicon
Round Two: Enhancing the Ollama Cluster
Favicon
Should I buy a new PC?
Favicon
Building the Brains of the Machine: A Guide to Becoming a Computer Hardware Engineer
Favicon
4090 - ECC ON vs ECC OFF
Favicon
freerouting kicad
Favicon
Troubleshooting External Hard Drives on Linux
Favicon
Using TensorFlow 2.8 on an Apple Silicon arm64 chip
Favicon
Nvidia's 1000x Performance Boost Claim Verified
Favicon
Ocean Supply Your Premier Destination for High-Quality Marine Gear
Favicon
Replacing the Battery in my Sennheiser PXC 550 Headphones
Favicon
USB HID Down the rabbit hole: Reverse engineering the Logitech CU0019 USB receiver
Favicon
Instruction Set Architecture, a linguagem das máquinas
Favicon
LLM Fine-tunig on RTX 4090: 90% Performance at 55% Power

Featured ones: