-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to change power limit with nvidia-smi #483
Comments
I also cannot change the power limit with nvidia-smi on this driver while i can on 525 series (I have a RTX 3050 laptop) On 530 it says "Changing power limit is not supported for this GPU) So which is intended behavior , make up your mind NVIDIA I really hope its not the ladder because of this fun issue which also happens to me #435 |
Relevant discussion here |
Oh so its a wide-reaching problem, not just for 2060 and 3050. |
With 530.30.02 (beta) the power limits were working fine for the first time on my laptop with a RTX 2060 Mobile 6GB. It broke after 530.42.03 update. |
Using the 525 version of the driver, my RTX3060 is able to limit power for the first time, but since updating the 530 version of the driver, it has been impossible to limit power |
If anybody here is using a lenovo gaming laptop with the 530 drivers |
got same issue |
on a 3060 AMD legion 5 |
Legion S7 15ACH6 / AMD Ryzen 7 5800H / NVIDIA GeForce RTX 3060 Same issue |
driver version?
|
530.41.03-15 |
try 535 beta |
unless it doesn't exist for okm then test proprietary |
Still have the issue on 535 BETA proprietary which is the only one with the "fix" i think |
hmmm |
test with 525 branch (prop, or okm if exists) |
Ryzen 5 5600H / NVIDIA GeForce RTX 3060 Laptop |
did you test 535 (prop) |
@barokvanzieks Yesterday I checked the work of the power limit on many versions of the drivers.
it turns out the power limit does not work since the version: 53X.XX.XX The power limit works on all 525.xx.xx drivers |
hmmmmm ok i havent tested 535 myself, guess ill stay on 525 for now |
i checked latest version of nvidia driver "535.54.03" with dkms and still NOT WORKING (i5-12500H / NVIDIA GeForce RTX 3060 Laptop / EndeavourOS) |
I think this is intended at this point They also unlocked it on windows around driver 528 and locked it back shortly after on laptops so maybe this is only for dekstop GPU's |
@kleidiss This is a very important opportunity. Laptops get very hot and drain the battery quickly. This feature is required! |
I agree. It would be worthwhile checking if anyone diffed the sources and figured out how to make a patch to enable it back. If not, I will attempt that this weekend, because right now - my laptop goes to 80-81C (with a REALLY good cooling pad)... previously I kept the framerate stable at 55C |
I just noticed it did not work anymore on my system, reverting back to 525.105.17 and everythings good. Thanks you The laptop is a TongFang GM5MG0O (bios N.1.09A08) // i7 10875H // RTX3070 (bios 94.04.3F.00.83) |
thatd be super cool of you |
but yes 535 isnt workinf for me either and its a pain in the arse since now i have to fuck with drivers to get OpenCL working |
this feature is essential for me since i can up wattage by like 50 watts from stock 80 |
question is which file would it be in... |
Same issue for me on a Lenovo T15 gen 2 |
Lenovo y9000p, nvidia 4060 mobile. The max of power is 140w, but it gets enforced at 55w. |
Try diver version 525. |
This is really frustrating, especially since RTX2060 is a Turing architecture. |
I wonder if there is any correlation with high temperatures and gpu performance lock, due to unoptimal automatic power control. I am using I can't verify this for sure until I get proper setup, but netdata helped to find an anomaly which potentially contributed to gpu underperforming state lock. EDIT: netdata, not netstat |
Guess its time to swap to Gentoo https://packages.gentoo.org/packages/x11-drivers/nvidia-drivers |
However going to 525 means giving up all those wayland patches in the new 555 build |
What does 525 have what 555 does not? For my expertise this is too deep knowledge, but would be interesting know if it is possible to make future versions of Nvidia being able to adjust power modes for 10, 16, 20 Gpus. Currently my system struggles to understand when to switch what power mode, sometimes it works Level 1, for no reason, and then Level 3. Temperatures looks alright, so as Power supply. Maybe it is not proper to share this image, but this is how it works on my 2070 Super MaxQ, using 555 driver: |
Nvidia simply disabled the feature because the user is rarted apparently. |
Well it is riding motorbike, if the user ignores the sensors rides it on maximum speed in bad conditions, it will crash and burn. Either option is not good now, as the automatic power state management is nor safe, nor stable, nor predictable. It's like having Kawasaki H2R which is capable to go 400 km/h, but sometimes even when riding it on 200 km/h it decides to throttle itself to 100 Km/h for no reason what so ever (or there is reason but not visible), and then suddenly it works normally again , until it doesn't. Imagine the frustration of the rider. I can't give better example. I will look more in to this, hopefully the code is not as complicated. Edit: |
The GPU Is rated for 65w but its capped at 35w unless I use powerd which doesn't work anyway or I use 525. So your analogy is wrong. What Nvidia has done is remove the ability of a user to change their motorbike sport setting from standard to race mode. Meaning they are constantly capped at half the HP!! If a motorcycle manufacturer did this there would be an uproar. |
Successfully with this 55w -> 125w |
@polytect can you explain the steps that you took with netsat? @Qervas I am already at 150W but the issue is even though I am not using it for anything the fans are running nearly 100%. Do you have this issue on your side ? And does power limit work on 560 ?
Fri Sep 13 23:39:36 2024 +-----------------------------------------------------------------------------------------+ |
I am a debian 12 user. The way I max out the power limit is to install the nvidia-powerd with apt.
And run it as root on boot. (I did this through crontab)
|
Always good to keep in mind that nvidia-powerd /Dynamic Boost can be used as long as you have an Ampere or newer card, not Turing as is what started this issue. https://download.nvidia.com/XFree86/Linux-x86_64/560.35.03/README/dynamicboost.html |
Fwiw spent over 6 hours chasing custom power drivers, and I don't really recommend that. This is super annoying since there shouldn't be a technical reason to prevent you from adjusting TDP within already used range. Not only was functionality removed, the way the current software ecosystem functions, there's very few practically feasible ways to revert and pin the needed behaviour. |
@xepost Sorry My confusion. I used NetData. Not This is NetData local service, dashboards of system logs: |
FWIW I recently discovered that Gentoo ships the powerd daemon as an OpenRC script when you use the 'powerd' use flag, Its /etc/init.d/nvidia-powerd. |
i was the one who reported it lol: https://bugs.gentoo.org/923117 |
This is exactly how I realized. Thank you, you legend! |
My Surface Laptop Studio has the same issue on Linux. I did find two work arounds online which work on all distros and could work for other computers. Could others please confirm that these work? |
I was also facing the same issue where nvidia-smi -pl stopped working. My RTX 4080 Laptop GPU was drawing up to 150W by default, which caused a lot of heat and noise. Since power limit adjustments aren’t allowed anymore, here’s what worked for me to cap the power around 80W: 1. Check Available Clock Frequencies: Run this to see your GPU’s supported frequency range:
For me, the range was 210 MHz (min) to 3105 MHz (max). 2. Lock GPU Clocks: I set the clock range to balance performance and efficiency:
3. Enable Persistence Mode: This reduced the power usage to around 80W, which is sufficient for my LLM workloads (like inference) with no significant performance impact:
4. Automate Settings: automate the GPU clock and power management settings so they apply at every reboot, I created a systemd service: Create the Service File: Run the following command to create a new service file:
Add the Following Configuration: Paste this code into the file:
Save and Enable the Service: Save the file. Then, enable and start the service:
To show the effectiveness of this setup, I’ve included two screenshots: Idle State: The first screenshot shows the GPU at an idle state, where the power usage is arounf 3W out of 80W, thanks to the minimum clock setting of 210 MHz. Since I am using a second external screen, the GPU remains active to some extent to support the display. Active LLM Processing: The second screenshot captures the GPU while actively utilizing power for LLM workloads. Here, the power usage stabilizes at 79/80W, showing that the maximum power limit is being fully utilized without exceeding the cap. This keeps the GPU efficient, cool, and quiet while maintaining strong performance for LLM tasks. This workaround works great for keeping my GPU at 80W, which reduces heat and fan noise while maintaining solid performance for LLM tasks. Hope this helps! 😊 |
sadly, Nvidia blocked frequency lock on GTX1060M with proprietary driver wish I could do that too |
Still happen with Ubutu 24.10 with 4060 mobile 😢 |
same am I |
Following the instructions at https://download.nvidia.com/XFree86/Linux-x86_64/560.35.03/README/dynamicboost.html With my 3050 mobile on Ubuntu, running Ignoring that, I manually installed the service and started it. Testing again under load, I found it was drawing max power, and that the cap was removed. Hopefully this helps someone |
Seems that Windows drivers don't fare any better, power limits are also locked here.
Context: I'm trying to lower the power limit while my laptop (with a GTX 1650) runs on the battery, default power limit while on battery is 30W, but I wanted to lower it by half and was unable to. |
Still an issue with the nvidia driver 565.77 on archlinux. Power limit is set to 35W with dynamic boost to 50W, although my device support a max GPU tgp of 65W |
The 525 drivers don't have this issue and seem to work fine for me without any issue (including games which "require" newer drivers). For Arch and derived distros, they can be installed from the AUR. The only notable bug in these drivers is that the GPU can only output up to your monitor's refresh rate. They can be installed on other distros, but it requires more work and may have more issues. I tried the instructions to install the 525 drivers on a non-Arch distro and it took more work. You need the kernel headers, dkms (optional, for 3rd party kernels), and have to reboot into mode 3. You will also need to patch the drivers for kernel updates (included in the download but may need to be reinstalled with newer patches over time). This worked on Fedora, but my monitor was capped at 60hz (it normally supports 120hz). For both of these reasons, I just switched back to Arch based. I previously tried disabling D3 mode and that worked as well, but the GPU has to always run and the power is still capped a bit lower than it should be. Still a lot better than normal, but not as good as the 525 drivers in my opinion. |
NVIDIA Open GPU Kernel Modules Version
530.41.03-1
Does this happen with the proprietary driver (of the same version) as well?
Yes
Operating System and Version
Arch Linux
Kernel Release
Linux 6.2.8-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 22 Mar 2023 22:52:35 +0000 x86_64 GNU/Linux
Hardware: GPU
GPU 0: NVIDIA GeForce RTX 2060 (UUID: GPU-2f685ce6-33f4-db75-ce05-81d1723a6ddb)
Describe the bug
Before recent update, I was able to execute:
~$ sudo nvidia-smi --power-limit 60
and have it work as expected.
After update, this is the output:
Changing power limits caused observable changes both in temperature and in performance, so I am pretty sure my GPU supports it.
For context:
I found that default power limit of 80W tends to heat up the GPU enough that it starts throttling itself and cause stuttering - 60W seemed to work perfectly and keep it under 63C during everything, without boosting my fan. The computer itself is a laptop (which explains issues with heat dissipation), a Lenovo Legion Y740, and I have a pretty good cooling pad to help out.
To Reproduce
~# nvidia-smi --power-limit 60
Bug Incidence
Always
nvidia-bug-report.log.gz
nvidia-bug-report.log.gz
More Info
Just a bit more context:
you may notice that in my Xorg config, I use option to skip EDID check for HDMI-0 output - the reason why, is because either driver doesn't recognize my (about 7y old) monitor, or the checksum that monitor outputs is invalid, and then it wouldn't let me use FullHD resolution on that monitor.
No idea if this is in any way related (I presume its) not, but this setup worked for over 5-6 months, so I doubt its related.
The text was updated successfully, but these errors were encountered: