-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
T440s with kernel >= 3.15 doesn't power off properly (Analysis and possible solution included !) #112
Comments
Do you have code to put the PCIe root in D3 as well ? Would gladly integrate it in #102. |
This is what I've been using this last week on my T440s : https://github.com/smunaut/bbswitch/commit/ca87980cb6105c34e1a138b553eac602d8151519 Note that this is non-conditional ... maybe there should be a whitelist of supported machines ? |
@smunaut: this change causes memory corruption after I wake up the NVidia card (i.e. back to base one). I am really not a kernel developer, but the obvious problem is that the patch puts the port in D3 but the actual PCIe driver is not informed of that. I've checked the kernel source code, PCIe ports do not seem to have runtime suspend/resume. That would probably be the better fix (use a dummy nvidia driver to suspend/resume the NV card using the kernel PM handlers and have the PCIe root port go into suspend accordingly). I already wrote the dummy driver, but I'm now trying to find someone I could contact that would be willing to look into this. If you know anybody doing kernel dev ... The alternative (that I'm going to try as soon as I have the time) would be to unbind the PCIe root port driver from the port and bind yet-another-dummy-driver to put the port to sleep. |
@doudou Mmm ... when I think about it, it must be #78 ... I mean for all intents and purposes, my patch will trigger the same behavior in ACPI as when using the acpi_osi string to revert to the old shutdown behavior. AFAIK there is no drivers binded to the pcie ports themselves, which is why they don't do auto runtime/suspend. I did raise the issue on linux-pm and linux-acpi but didn't really get any answers. |
It seems that there is. lspci reports "pcieport", which matches the driver in drivers/pci/pcie/portdrv_pci.c. As to whether we can unbind it, that's another story altogether ;-) |
Yes, it seems to work. In /sys/bus/pci/drivers/pcieport After
the driver is not listed as driver of the port, and after
it gets listed again |
I'd also point out that doing the ACPI DSM call is useless with this new method. It's not required and so maybe removing it would help your issues ? |
Yes, I already removed the DSM calls. I'm going to push the module I wrote, which basically registers a PCI driver for the NV card and for the PCIe root port, and manages to make them autosuspend. They both go to D3 on autosuspend, and then boom, the system crashes hard. I did not investigate further, I should really be doing something else than this ;-) I start to believe that the corruption issues directly stem from the hard shutdown of the NV card. |
Some progress. I've managed to get to a state where both the NVidia and PCIe port get into suspend, using only normal runtime suspend kernel paths. Basically, they get into D3 thanks to the common PCI power-management code. The code is on this branch. The catch: I get the memory corruption anyways as soon as the card(s) are woken up :( I want to do some reading in intel documentation, whether there are things that need to be done before one is allowed to put a PCIe port in D3. Note that because I use runtime suspend, you cannot use lspci to check the card's state. This wakes it up. I've used systemtap and powertop to verify that (1) the kernel was attempting to put the cards in D3 and succeeded, and (2) that it led to significant power savings (it did, almost 2W more than with having the NV card in D3). |
@doudou Does this code mess up non thinkpad laptops? Like do you plan on doing a pull request? |
Don't know ... only have a thinkpad
No, given its effects on my laptop ... I already have a pull request (#102) on a method that kind-of works. Meaning:
I personally use this one on a 4.0.7 kernel. Left the rest on the side, I don't have time for this right now. |
@doudou Ok. Cool. Thanks for the info. If you ever have time to get it in a state where you'd like it to be tested on a non thinkpad I have a spare optimus ideapad I could test it on. |
I successfully tested the patch on Fedora 23 / 4.2.1-300 on a Thinkpad W541 / Optimus K2100 and power consumption went from 25W to 19W. Thanks! |
Is there a plan to merge this fix into mainline? I'm about to jump to F23. |
I'm seeing some possibly related issue with a brand new T450s |
It'd be very nice if we could get this mainlined rather than have it as a required patch. |
@storrgie The issue is that this fix is completely ThinkPad specific ... It would probably break things on other brands. (and even on thinkpads it's only been tested on a few models) |
this is still failing for me, I'm getting this when trying to load bbswitch
|
trying to turn off with tee:
|
bumblebee works correctly, in the sense of loading the nvidia driver when run with optirun, it's just that the card won't switch off |
here's my lspci -v output for the nvidia card (while running glxgears with optirun)
|
dmesg when running optirun:
and when stopping it:
|
There is nothing that indicates in your logs that it's not working. The "Refused to change power state" is inconsequential. The only way to check if it's working or not is to check if the rev is "ff" during a lspci when it's turned off. |
@smunaut the rev is not ff, it doesn't change. The first time I tried to turn it off with the vanilla module I got a traceback, as well, I'd forgotten about that:
Also, according to KDE's battery status panel, I do get an hour less of battery while optirun is running something, but when it's supposed to be turned off, I'm still getting this with lspci
Notice that as reported above by other people, I'm also getting
regardless |
Are you sure you took the source from my repo and switched to the right branch ? (the default at checkout is 'master' and doesn't have the patch) |
yes, branch
|
I have a Lenovo T450s with Fedora 23.
I downloaded doudou's bbswitch code from https://github.com/doudou/bbswitch, build it with 'make' and load it with 'make load' with no error. But the Nvidia card is still on.
What's the problem? I'm a user rather than a programmer. I just want a long battery life in linux system. |
@valneacsu, using acpi_osi=Linux solves my problem. Thank you! |
For those who are wondering why the methods are different depending on the kernel version (or the setting of David Airlie posted a patch to handle the power resources for nouveau/vgaswitcheroo, but I don't know what happened to those patches (https://lkml.org/lkml/2016/3/9/65). You could probably check the power resources of the parent device, but that smells hacky if you call it directly (shouldn't the Linux PM take care of this?). See also the analysis at #115 (comment). While turning off has no DSM methods, surprisingly there are some involved with turning it on. Can you reproduce/match this with your machines? (Will comment later on the patch, my battery is dieing) |
David Airlie's patch definitely looks interesting. Even more so because the PM handler is exported, and could therefore be used outside of vgaswitcheroo-enabled drivers. |
If you look at the LKML thread, it seems that the kernel devs are tackling the root problem ... that is adding runtime PM to the PCIe root ports. What it means for the future of bbswitch, I'm not sure, since it will work only if the card itself is put in D3 first.
On mine, the NVidia suspend/resume with pcie-root-port in D3 works without any calls to any DSM. |
I read the full thread (and some of the linked patchwork entries). David's patch adds a function to enabble PM operations that power off/on the parent (PCIe port) device and hooks it into nouveau. Rafael commented that the patches PM for PCIe ports are still under discussion. Especially note:
Edit: latest version of PCIe patches are scheduled for v4.7, see http://article.gmane.org/gmane.linux.power-management.general/75997 and https://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/log/?h=pci/pm As for bbswitch's future, it is currently broken for newer devices so I'll probably do this:
In case upstream Linux adds proper PM to the PCIe root port, then bbswitch should not interfere with that, so there also needs to be checks for that (PM domains?). Some reading material: https://www.kernel.org/doc/Documentation/power/devices.txt bbswitch differentiates itself from nouveau in that it will always keep the device powered off unless explicitly asked by the user (via /proc/acpi/bbswitch). Currently nouveau will flip the card on when you execute
Did you observe this on Windows? Is Windows doing any DSM calls after D0? |
This is due to the usage of runtime PM. lspci triggers a wakeup. I had a version of bbswitch that was purely relying on runtime PM and noticed that. Once all the PCIe root port work is in the kernel, I'm planning try to use nouveau instead of bbswitch just for the PM work. I still rely on the nvidia proprietary driver for proper opengl support.
I would not even begin to know how I could check that. |
Just wanted to confirm this issue on a Thinkpad W550s running BIOS version 1.14. |
@shadoxx https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1452979 lists acpidump for a W550s (Quadro K620M, BIOS 1.02). The SSDT changes are not relevant. DSDT has changes a bit, mainly wrt USB 3.0, some TPM changes. These are the hunks relevant for graphics (GPON, GPOF and Win10 detection): diff --git a/acpi-1.02/dsdt.dsl b/acpi-1.14/dsdt.dsl
index 53b2c77..9717867 100644
--- a/acpi-1.02/dsdt.dsl
+++ b/acpi-1.14/dsdt.dsl
@@ -1051,24 +1051,30 @@ DefinitionBlock ("dsdt.aml", "DSDT", 1, "LENOVO", "TP-N11 ", 0x00001020)
If (\_OSI ("Windows 2012"))
{
\WIN8 = 0x01
OSYS = 0x07DC
}
If (\_OSI ("Windows 2013"))
{
\WIN8 = 0x01
OSYS = 0x07DD
}
+ If (\_OSI ("Windows 2015"))
+ {
+ \WIN8 = 0x01
+ OSYS = 0x07DF
+ }
+
If (\_OSI ("Linux"))
{
\LNUX = 0x01
OSYS = 0x03E8
}
If (\_OSI ("FreeBSD"))
{
\LNUX = 0x01
OSYS = 0x03E8
}
}
@@ -6774,32 +6770,36 @@ DefinitionBlock ("dsdt.aml", "DSDT", 1, "LENOVO", "TP-N11 ", 0x00001020)
{
GPOF (0x00)
}
}
Method (GPON, 1, NotSerialized)
{
If (ISOP ())
{
If (DGOS)
{
\VHYB (0x02, 0x00)
- Sleep (0x64)
+ Sleep (0x14)
If ((ToInteger (Arg0) == 0x00)) {}
\VHYB (0x00, 0x01)
- Local0 = 0x00
- While ((Local0 < 0x5A))
+ Sleep (0x14)
+ Local2 = \VHYB (0x0E, 0x00)
+ While ((Local2 != 0x0F))
{
- Local0 += One
- Stall (0x64)
+ \VHYB (0x00, 0x00)
+ Sleep (0x14)
+ \VHYB (0x00, 0x01)
+ Sleep (0x0A)
+ Local2 = \VHYB (0x0E, 0x00)
}
\VHYB (0x02, 0x01)
Sleep (0x01)
\VHYB (0x08, 0x01)
Local0 = 0x0A
Local1 = 0x32
LREN = LTRS /* \_SB_.PCI0.PEG_.LTRS */
CEDR = One
While (Local1)
{
Sleep (Local0)
@@ -6851,31 +6851,25 @@ DefinitionBlock ("dsdt.aml", "DSDT", 1, "LENOVO", "TP-N11 ", 0x00001020)
Method (GPOF, 1, NotSerialized)
{
If (ISOP ())
{
If ((VMSH || (\_SB.PCI0.PEG.VID.OMPR == 0x03)))
{
LTRS = LREN /* \_SB_.PCI0.PEG_.LREN */
\SWTT (0x00)
\VHYB (0x08, 0x00)
\VHYB (0x08, 0x02)
\VHYB (0x02, 0x00)
- Local0 = 0x00
- While ((Local0 < 0x1E))
- {
- Local0 += One
- Stall (0x64)
- }
-
+ Sleep (0x09)
\VHYB (0x00, 0x00)
If ((ToInteger (Arg0) == 0x00)) {}
DGOS = One
\_SB.PCI0.PEG.VID.OMPR = 0x02
}
}
}
Method (_STA, 0, NotSerialized) // _STA: Status
{
Return (0x0F)
} The ACPI changes do not look very significant, it is mainly the timing that are changed. GPOF went from 30×100µs(=3ms) to 9ms. GPON is slightly more interesting. It changes the timings and adds a SMI call (unknown function). Maybe there were other non-ACPI tunings. Anyway, maybe things get better when the power resources are used instead of DSM. |
Could someone try some patch series on top of v4.7 with nouveau? See #78 (comment) |
FYI, this has been fixed for the nouveau in Linux v4.8-rc1, bbswitch still needs an update though. |
@Lekensteyn |
@BernardoGO What suspend problem? System sleep or runtime suspend? |
Just to leave another footprint in this epic issue. If somebody using ubuntu 16.04 (and probably other distros with >4.4.X), and used nvidia-361, it was force-replaced with nvidia-367 (you install 361, but apt gives you 367 with 361 simultaneously), so you could experience all problems again. First of all, use up to date bumblebee and primus from bumblebee/testing ppa, install nvidia-367. And then, after it started to work, check bbswitch, since it probably stopped working properly. Worked with Lenovo T440s. |
just an update, on a ThinkPad T450s, the card is constantly on and the grub fix above doesn't work (plasma never finishes loading and a bunch of error messages show up in the logs). |
Dear Tomas Neme,
Thank you for your email. I have already successfully set up bumblebee on the T450s.
Best,
Jianghua Wu
…-----原始邮件-----
发件人:"Tomas Neme" <[email protected]>
发送时间:2019-04-25 19:55:10 (星期四)
收件人: Bumblebee-Project/bbswitch <[email protected]>
抄送: wuqso <[email protected]>, Comment <[email protected]>
主题: [SPAM] Re: [Bumblebee-Project/bbswitch] T440s with kernel >= 3.15 doesn't power off properly (Analysis and possible solution included !) (#112)
just an update, on a ThinkPad T450s, the card is constantly on and the grub fix above doesn't work (plasma never finishes loading and a bunch of error messages show up in the logs).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Switch to the development branch, as it's basically stable at this point, and pinch Arch's 5.18 compat patch. Also remove existing patches: - PR 102 was rejected upstream, the actual issue is Bumblebee-Project/bbswitch#112, and can be worked around via acpi_osi=Linux - PR 196 is applied upstream. (cherry picked from commit e1c59ea)
Switch to the development branch, as it's basically stable at this point, and pinch Arch's 5.18 compat patch. Also remove existing patches: - PR 102 was rejected upstream, the actual issue is Bumblebee-Project/bbswitch#112, and can be worked around via acpi_osi=Linux - PR 196 is applied upstream. (cherry picked from commit e1c59ea)
Hi,
So I'm using a T440s with a modern kernel that reports "Windows 2013" compatibility (and soon Windows 2015 with kernel 4.2). This breaks bbswitch because of some changes this triggers in the ACPI table. Manually overriding acpi_osi does fix the issue and allow the current bbswitch to work, but what I'm looking it here is how to make it work with the "new" Win 8.1 method of shutting down the card.
So the main symptoms of the issue are :
This is the DSDT table from the T440s with the latest bios (which even has Windows 10 support) :
http://pastebin.com/raw.php?i=C6Q3A8aa
The important thing to note is that when "Windown 2013" string is found, then OSYS is set to 0x07DD. This in turn cause VMSH to be set to 1. This in turn causes SB.PCI0.PEG_.VID_._PS3 to NOT call GPOF ... and so the card is never really turned off completely.
Now if you look at how GPOF can be called, you can see it will be called as part of NVP3 power resource which is PR3 ... but on the node SB.PCI0.PEG and not SB.PCI0.PEG_.VID_ !
So basically you need to put the PCIe root port (parent pci device) in D3 and not just the card.
I tested this and it indeed triggered the proper expected power saving and seemed to behave exactly like if I tweaked acpi_osi.
The text was updated successfully, but these errors were encountered: