dev-resources.site

for different kinds of informations.

LLM Fine-tunig on RTX 4090: 90% Performance at 55% Power

Published at

5/29/2024

Categories

4 categories in total

Author

11 person written this

maximsaplin

open

LLM Fine-tunig on RTX 4090: 90% Performance at 55% Power

At just a fraction of power, 4090 is capable of delivering almost full performance.

While running SFT (supervised fine-tuning) via Hugginface's TRL library (using Torch as a backend) I decided to move Afterburner power slider down:

And checked wandb dashboards for changes in training speed (epochs per hour) and GPU power:

Here's the full table with performance (and a few other measurements) at different power levels:

Power, W	Temp, °C	Afterburner PWR %	Perf (Epoch/Hour)	Perf/kW	Power %	Perf %
390	72	100%	0,442	1,134	100,0%	100,0%
330	70	80%	0,436	1,322	84,6%	98,6%
300	67	70%	0,413	1,378	76,9%	93,4%
260	62	60%	0,405	1,557	66,7%	91,5%
240	60	55%	0,394	1,644	61,5%	89,2%
220	58	50%	0,365	1,660	56,4%	82,6%
180	52	40%	0,271	1,508	46,2%	61,3%
150	47	33%	0,221	1,473	38,5%	49,9%

If you run long training sessions on your RTX 4090 PC and would like to save on electricity bills OR keep your room cooler (500W midi tower is quite a heater), limiting GPU power to 50-60% makes total sense.

Besides there's a sweet spot at 50% (220W) with maximum efficiency (performance-per-watt or trained-epochs-per-watt*hour). At this power level, you still get 82% of the max speed.

Few Notes on RTX 4090 Power Levels

Most desktop RTX 4090 cards are rated at 450W, such as mine (Palit 4090 flashed with Asus 450W 1.1v BIOS). There're versions with 500W, 600W and even 666W power limits.

I could see 450W power consumption in the OCCT synthetic benchmark. In 3D Mark TimeSpy max power observed was around 430W.

While running the above training (full fine-tuning) the max reported power was 390W - TRL/Torch was not able to fully utilize the GPU (actual utilization being at around 90%). This can be explained by not filling the entirety of VRAM (~20GB out of 24GB). And it could be fixed by increasing the batch size training param (and risking VRAM overflow into shared memory significantly slowing down the total training time). On some other occasions, I could see 410-420W from TRL running LORA fine-tuning.

Based on actual GPU power reported it seems that Afterburner power limits were calculated assuming 100% is 440W.

Gaming Performance Follows Suit

The diminishing performance returns of 4090 have been evaluated before. E.g. in this reddit post a user shared 3DMark FireStrike scores from RTX 4090. The outcomes are the same, you get 80% performance at a 50% power limit.

hardware Article's

30 articles in total

AI in Your Hands: Nvidia’s $3,000 Supercomputer Changes Everything