Logo

dev-resources.site

for different kinds of informations.

LLM Fine-tunig on RTX 4090: 90% Performance at 55% Power

Published at
5/29/2024
Categories
ai
machinelearning
hardware
perfrormance
Author
maximsaplin
Author
11 person written this
maximsaplin
open
LLM Fine-tunig on RTX 4090: 90% Performance at 55% Power

At just a fraction of power, 4090 is capable of delivering almost full performance.

While running SFT (supervised fine-tuning) via Hugginface's TRL library (using Torch as a backend) I decided to move Afterburner power slider down:

Afterburner

And checked wandb dashboards for changes in training speed (epochs per hour) and GPU power:

epoch/time

GPU power/time

Here's the full table with performance (and a few other measurements) at different power levels:

Power, W Temp, °C Afterburner PWR % Perf (Epoch/Hour) Perf/kW Power % Perf  %
390 72 100% 0,442 1,134 100,0% 100,0%
330 70 80% 0,436 1,322 84,6% 98,6%
300 67 70% 0,413 1,378 76,9% 93,4%
260 62 60% 0,405 1,557 66,7% 91,5%
240 60 55% 0,394 1,644 61,5% 89,2%
220 58 50% 0,365 1,660 56,4% 82,6%
180 52 40% 0,271 1,508 46,2% 61,3%
150 47 33% 0,221 1,473 38,5% 49,9%

If you run long training sessions on your RTX 4090 PC and would like to save on electricity bills OR keep your room cooler (500W midi tower is quite a heater), limiting GPU power to 50-60% makes total sense.

Besides there's a sweet spot at 50% (220W) with maximum efficiency (performance-per-watt or trained-epochs-per-watt*hour). At this power level, you still get 82% of the max speed.

Few Notes on RTX 4090 Power Levels

Most desktop RTX 4090 cards are rated at 450W, such as mine (Palit 4090 flashed with Asus 450W 1.1v BIOS). There're versions with 500W, 600W and even 666W power limits.

I could see 450W power consumption in the OCCT synthetic benchmark. In 3D Mark TimeSpy max power observed was around 430W.

While running the above training (full fine-tuning) the max reported power was 390W - TRL/Torch was not able to fully utilize the GPU (actual utilization being at around 90%). This can be explained by not filling the entirety of VRAM (~20GB out of 24GB). And it could be fixed by increasing the batch size training param (and risking VRAM overflow into shared memory significantly slowing down the total training time). On some other occasions, I could see 410-420W from TRL running LORA fine-tuning.

Based on actual GPU power reported it seems that Afterburner power limits were calculated assuming 100% is 440W.

Gaming Performance Follows Suit

The diminishing performance returns of 4090 have been evaluated before. E.g. in this reddit post a user shared 3DMark FireStrike scores from RTX 4090. The outcomes are the same, you get 80% performance at a 50% power limit.

3DMark FireStrike scores

hardware Article's
30 articles in total
Favicon
AI in Your Hands: Nvidia’s $3,000 Supercomputer Changes Everything
Favicon
Exploring Embedded System Development: How to Choose and Optimize Single Board Computers and Development Boards
Favicon
Why Is My Printer Offline? Easy Steps to Fix It
Favicon
Rust on a $2 dev board
Favicon
Curso De Fundamentos Do Hardware Gratuito Com Certificado Da Cisco
Favicon
Hardware and PCB design considerations for ESP8266 based programmable controller.
Favicon
The Power of Memory Map
Favicon
Extending NUC with External U.2 SSD
Favicon
I'm building robots
Favicon
ARM vs x86, Which works best for you!
Favicon
Which Operating System Offers Better Hardware Compatibility: Windows or Linux?
Favicon
Curso De Suporte De TI Online E Gratuito Da JA Brasil
Favicon
Tinkerforge Weather Station, part 3 - Continuing the project after a decade
Favicon
The Evolution of Hardware vs. Software Speed: Analyzing the Growing Gap Over Time
Favicon
USB HID Down the rabbit hole: Logitech G435 dongle
Favicon
What are ASIC cooling systems?
Favicon
Software OR Hardware Raid: What's Better In 2024?
Favicon
Round Two: Enhancing the Ollama Cluster
Favicon
Should I buy a new PC?
Favicon
Building the Brains of the Machine: A Guide to Becoming a Computer Hardware Engineer
Favicon
4090 - ECC ON vs ECC OFF
Favicon
freerouting kicad
Favicon
Troubleshooting External Hard Drives on Linux
Favicon
Using TensorFlow 2.8 on an Apple Silicon arm64 chip
Favicon
Nvidia's 1000x Performance Boost Claim Verified
Favicon
Ocean Supply Your Premier Destination for High-Quality Marine Gear
Favicon
Replacing the Battery in my Sennheiser PXC 550 Headphones
Favicon
USB HID Down the rabbit hole: Reverse engineering the Logitech CU0019 USB receiver
Favicon
Instruction Set Architecture, a linguagem das máquinas
Favicon
LLM Fine-tunig on RTX 4090: 90% Performance at 55% Power

Featured ones: