Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GF76 11-UC #138

Closed
Waujito opened this issue Jul 2, 2024 · 35 comments
Closed

GF76 11-UC #138

Waujito opened this issue Jul 2, 2024 · 35 comments
Labels
New firmware Request for a new firmware

Comments

@Waujito
Copy link
Contributor

Waujito commented Jul 2, 2024

Laptop model

MSI Katana GF76 11UC

EC firmware version

17L2EMS1.108

EC memory dump

| _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _a _b _c _d _e _f
-----+------------------------------------------------
0x0_ | 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1_ | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x2_ | 00 00 00 00 00 00 00 00 0a 05 00 00 00 04 0b 0b
0x3_ | 03 00 00 0d 00 00 50 81 00 00 00 00 00 00 00 00
0x4_ | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x5_ | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x6_ | 00 00 00 00 00 00 00 00 2f 00 37 40 49 4c 52 58
0x7_ | 64 26 26 2b 30 36 3c 46 55 64 08 03 03 03 03 03
0x8_ | 00 00 37 3d 43 49 4f 54 63 00 00 2b 30 36 3c 46
0x9_ | 55 64 08 03 03 03 03 02 02 0f 7d 02 0a 78 39 00
0xa_ | 31 37 4c 32 45 4d 53 31 2e 31 30 38 30 34 31 30
0xb_ | 32 30 32 33 31 33 3a 34 34 3a 34 32 00 00 00 28
0xc_ | 00 00 07 22 00 00 00 00 00 d1 00 00 00 00 00 00
0xd_ | 00 00 c1 83 0d 00 05 80 00 01 00 00 00 00 00 00
0xe_ | e2 00 00 00 00 00 00 40 00 00 00 00 00 c0 00 00
0xf_ | 40 00 70 00 00 64 00 00 64 00 00 00 00 00 00 00

GPU

Nvidia

Is your keyboard RGB?

No (single color)

Additional context

I will try to provide support for it with myself but I will probably need in your help.
So far I have found:
✔️ Cooler boost
✔️ Webcam toggle
✔️ Webcam block
✔️ Fn <-> Win
✔️ Mic mute LED
✔️ Sound mute LED
✔️ Keyboard backlight intensity

❓ Shift mode
❓ Fan mode

What about shift mode and fan mode is I can't see real difference between it. But when I turn on eco mode my fans were silenced and that seems good.
Also I can approve that fan mode somehow affects my fans. When I have written advanced fans settings to EC like it is done in MControlCenter:

const int fan1SpeedSettingStartAddress = 0x72;
const int fan2SpeedSettingStartAddress = 0x8A;
const int fanSpeedSettingsCount = 7;
const int fan1TempSettingStartAddress = 0x6A;
const int fan2TempSettingStartAddress = 0x82;
const int fanTempSettingsCount = fanSpeedSettingsCount - 1;

fan1Temp = {48, 53, 60, 65, 70, 74};
fan1Settings = {0, 43, 60, 75, 85, 100, 100};
fan2Temp = {50, 55, 60, 65, 70, 72};
fan2Settings = {0, 43, 60, 75, 85, 100, 100};

My fans seemed like completly broken: No fans in auto mode and something in advanced mode. But when I had run stress on my cpu fans started to increase rpm.

✔️ CPU Temperature
⭕ GPU Temperature
When I run nvidia-smi gpu temperature starts to show for not a long time. Seems like not msi-ec problem, just limitation from MSI.

❓CPU Fan speed
cat: /sys/devices/platform/msi-ec/cpu/realtime_fan_speed: Invalid argument
Turns out to be 50 in boost mode.
⭕ GPU Fan speed
It works but what about format? It is not percent nor rpm.
0 when off, 190 when silent, 78 in boost. It is just a raw data from (0xCD).
For cpu it seems like the problem is in rt_fan_speed_base_min_max fractions.

The formula from MControlCenter seems like a workaround:

static ssize_t cpu_realtime_fan_speed_show(struct device *device,
					   struct device_attribute *attr,
					   char *buf)
{
	u8 rdata;
	int result;

	result = ec_read(conf.cpu.rt_fan_speed_address, &rdata);
	if (result < 0)
		return result;

	int val = 0;
	if (rdata > 0) 
		val = 470000/rdata;
	
	return sysfs_emit(buf, "%i\n", val);
}

It provides fan speed right in rpm, seems ok (2400 in normal mode, 6000 in boost) but I have no idea about correctness.

I will test battery later. Working with EC config and battery on (=> no EC clears) is a dead way :)

@Waujito Waujito added the New firmware Request for a new firmware label Jul 2, 2024
@glpnk
Copy link
Contributor

glpnk commented Jul 2, 2024

Hi.

My fans seemed like completly broken: No fans in auto mode and something in advanced mode. But when I had run stress on my cpu fans started to increase rpm.

Works as intended because fans are power hungry and drain the battery faster. Fans turns on near 50 degrees on CPU.

❓ Shift mode
❓ Fan mode

Shift - 0xD2
Fan - 0xD4


UPD: shift mode should change CPU power limit, but this not always work as intended


The RPM calculation uses a slightly different constant 470000 480000, and 2 bytes instead of 1. Reading RPM would be supported in LM sensors starting with 6.10 kernel, or might be available now on 6.9.x kernels since the new kernel module is already merged.

CPU/GPU fan speed should be in the range of 0-150%, but currently the code applies normalization for CPU %RPM.

Custom fan curve is not supported yet, you can only enable it by changing the fan mode. If necessary, use MControl Center to edit the fan curve.

For me, on a non-MSI laptop with Nvidia GPU, Windows does not always show the temperature. On Linux, the behavior is probably the same, because the GPU is just turned off for better battery life and can't report its temperature.

You can copy any WMI2 named config and change it for your device. Don't use 0xC8-CF values, because it is coolers RPM.

How many coolers you have in the laptop?

@Waujito
Copy link
Contributor Author

Waujito commented Jul 2, 2024

Works as intended because fans are power hungry and drain the battery faster. Fans turns on near 50 degrees on CPU.

In theory it is, but in fact my computer may be 75 degrees with slow fans. The behaviour with setted up advanced fans in auto mode is really differ from one with no advanced settings (auto mode too). Right now I tried to setup this curve and my fans stopped (but fan_mode is auto).

Reading RPM would be supported in LM sensors starting with 6.10 kernel, or might be available now on 6.9.x kernels since the new kernel module is already merged.

Good news! Do you know about pwmconfig and fancontrol support?

How many coolers you have in the laptop?

2 fans

Shift - 0xD2
Fan - 0xD4

I think I can mark Shift and Fan as ready. No issues with it so far. Works as expected.

Also I have prepared a PR that fixes reversed cpu fan speed. (0x4d on max speed and 0xb0 on min speed (0x00 = no speed))

@glpnk
Copy link
Contributor

glpnk commented Jul 2, 2024

Technically, it's not reverted fan speed, but some time per rotation

I'll check your dump for fan curve settings

@Waujito
Copy link
Contributor Author

Waujito commented Jul 2, 2024

Technically, it's not reverted fan speed, but some time per rotation

Sounds logic but what about other versions that are supported? Is it speed there?

I'll check your dump for fan curve settingsSo in other versions it is speed and in mine it is time per rotation?

If you really interested in curve:
Without curve:

     | _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _a _b _c _d _e _f
-----+------------------------------------------------
0x0_ | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1_ | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x2_ | 00 00 00 00 00 00 00 00 0a 05 00 00 00 04 0b 0b
0x3_ | 03 00 00 0d 00 00 50 81 00 00 00 00 00 00 00 00
0x4_ | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x5_ | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x6_ | 00 00 00 00 00 00 00 00 38 00 37 40 49 4c 52 58
0x7_ | 64 2b 26 2b 30 36 3c 46 55 64 08 03 03 03 03 03
0x8_ | 00 00 37 3d 43 49 4f 54 63 00 00 2b 30 36 3c 46
0x9_ | 55 64 08 03 03 03 03 02 02 0f 7d 02 0a 78 3b 00
0xa_ | 31 37 4c 32 45 4d 53 31 2e 31 30 38 30 34 31 30
0xb_ | 32 30 32 33 31 33 3a 34 34 3a 34 32 00 00 00 28
0xc_ | 00 00 07 00 00 00 00 00 00 d5 00 00 00 00 00 00
0xd_ | 00 00 c1 83 0d 00 05 80 00 01 00 00 00 00 00 00
0xe_ | e2 00 00 00 00 00 00 40 00 00 00 00 00 d1 00 00

With curve:

     | _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _a _b _c _d _e _f
-----+------------------------------------------------
0x0_ | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1_ | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x2_ | 00 00 00 00 00 00 00 00 0a 05 00 00 00 04 0b 0b
0x3_ | 03 00 00 0d 00 00 50 81 00 00 00 00 00 00 00 00
0x4_ | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x5_ | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x6_ | 00 00 00 00 00 00 00 00 39 00 30 35 3c 41 46 4a
0x7_ | 64 3c 00 2b 3c 4b 55 64 64 64 08 03 03 03 03 03
0x8_ | 00 00 32 37 3c 41 46 48 63 00 00 2b 3c 4b 55 64
0x9_ | 64 64 08 03 03 03 03 02 02 0f 7d 02 0a 78 3d 00
0xa_ | 31 37 4c 32 45 4d 53 31 2e 31 30 38 30 34 31 30
0xb_ | 32 30 32 33 31 33 3a 34 34 3a 34 32 00 00 00 28
0xc_ | 00 00 07 00 00 00 00 00 00 e8 00 00 00 00 00 00
0xd_ | 00 00 c1 83 0d 00 05 80 00 01 00 00 00 00 00 00
0xe_ | e2 00 00 00 00 00 00 40 00 00 00 00 00 d1 00 00

Also in integers which is better for understand of thermal data.

Without curve:

$ od -t u1 -A x /sys/kernel/debug/ec/ec0/io
000000   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
*
000020   0   0   0   0   0   0   0   0  10   5   0   0   0   4  11  11
000030   3   0   0  13   0   0  80 129   0   0   0   0   0   0   0   0
000040   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
*
000060   0   0   0   0   0   0   0   0  55   0  55  64  73  76  82  88
000070 100  43  38  43  48  54  60  70  85 100   8   3   3   3   3   3
000080   0   0  55  61  67  73  79  84  99   0   0  43  48  54  60  70
000090  85 100   8   3   3   3   3   2   2  15 125   2  10 120  59   0
0000a0  49  55  76  50  69  77  83  49  46  49  48  56  48  52  49  48
0000b0  50  48  50  51  49  51  58  52  52  58  52  50   0   0   0  40
0000c0   0   0   7   0   0   0   0   0   0 210   0   0   0   0   0   0
0000d0   0   0 193 131  13   0   5 128   0   1   0   0   0   0   0   0
0000e0 226   0   0   0   0   0   0  64   0   0   0   0   0 209   0   0
0000f0   0   0 112   0   0 100   0   0 100   0   0   0   0   0   0   0

With curve:

$ od -t u1 -A x /sys/kernel/debug/ec/ec0/io
000000   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
*
000020   0   0   0   0   0   0   0   0  10   5   0   0   0   4  11  11
000030   3   0   0   5   0   0  80 129   0   0   0   0   0   0   0   0
000040   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
*
000060   0   0   0   0   0   0   0   0  59   0  48  53  60  65  70  74
000070 100  60   0  43  60  75  85 100 100 100   8   3   3   3   3   3
000080   0   0  50  55  60  65  70  72  99   0   0  43  60  75  85 100
000090 100 100   8   3   3   3   3   2   2  15 125   2  10 120  61   0
0000a0  49  55  76  50  69  77  83  49  46  49  48  56  48  52  49  48
0000b0  50  48  50  51  49  51  58  52  52  58  52  50   0   0   0  40
0000c0   0   0   7   0   0   0   0   0   0   0   0   0   0   0   0   0
0000d0   0   0 193 131  13   0   5 128   0   1   0   0   0   0   0   0
0000e0 226   0   0   0   0   0   0  64   0   0   0   0   0 209   0   0
0000f0   0   0 112   0   0 100   0   0 100   0   0   0   0   0   0   0

As you can see on dump without curve my fan was running and after curve write it has stopped (0xc9) but temperature after write was even larger (I heated it with stress). And fan mode keeps the same (0x0d). I didn't change fan mode at all so in fact fan speed should be the same.

@glpnk
Copy link
Contributor

glpnk commented Jul 2, 2024

Is it latest ec/bios update? Because it sounds too terrible. Thanks for dumps

@Waujito
Copy link
Contributor Author

Waujito commented Jul 2, 2024

Is it latest ec/bios update? Because it sounds too terrible. Thanks for dumps

Yes, I updated bios about few weeks ago exactly for this reason

@glpnk
Copy link
Contributor

glpnk commented Jul 2, 2024

Really sad. Is on Windows you have same behaviour?

@Waujito
Copy link
Contributor Author

Waujito commented Jul 2, 2024

Really sad. Is on Windows you have same behaviour?

As I remember on windows everything was ok. But i haven't used windows for about a half of year, on new bios too. I can try to install it and play with MSI center but is it possible to get these dumps there?

@glpnk
Copy link
Contributor

glpnk commented Jul 2, 2024

@glpnk
Copy link
Contributor

glpnk commented Jul 3, 2024

According to dumps, you have not enabled fan curve mode

But for some reason you have changed temperatures of fan curve, and not RPM percents

To activate custom fan curve, you need to set 0xD4 to 0x8D

@Waujito
Copy link
Contributor Author

Waujito commented Jul 3, 2024

According to dumps, you have not enabled fan curve mode

Yes and thats the problem. Fan curve is not enabled but fans speed changed (more scary them are stopped). I tried to specify the curve in vim manually to exclude mistakes in my script and behavior is the same.

Also I installed windows (spent an entire day trying to install it without usb :)) and there are only 6 curve sliders. Starting addresses are ok. May be the 7th is reserved away from user. Also there are no temperature indicators per sliders. Just a heatmap with fixed sliders on it. I made a lot of dumps and will explore them now.

UPD:
In last vim test only one (cpu) fan was stopped.

@glpnk
Copy link
Contributor

glpnk commented Jul 3, 2024

Try to reset the EC and BIOS by guide from the user manual, then save clean variant of fan curves.

7th slider is probably like last resort before burning CPU/GPU and emergency shutdown

Temperatures may not be intended to change

WDYM as vim test? Literally dumping all memory, patching and writing back? Or changing certain values sequentially?

@Waujito
Copy link
Contributor Author

Waujito commented Jul 3, 2024

Try to reset the EC and BIOS by guide from the user manual, then save clean variant of fan curves.

Yes, I do.

WDYM as vim test? Literally dumping all memory, patching and writing back? Or changing certain values sequentially?

dumping, patching and writing back. xxd filters in vim are OP. But if you forgot to apply it before :w say hi to EC reset :)

So after observing windows logs it turns out that MSI Control simple remembers my thermal settings and replaces it with default curve when I go from advanced to balanced mode.
And after some tests in linux I'm sure that it uses curve(or part of it) even despite of "comfort" and "auto".

@glpnk
Copy link
Contributor

glpnk commented Jul 3, 2024

You can check is curve applied by making EC dumps under load and comparing temperature and RPM % values with the curve.

MControlCenter set cooler mode to 0x8D in fan curve mode

Or you can set idle RPM % (first slider) to more than 0 value and disable fan curve mode


Hmm, my device really not care about fan mode (custom/auto) when first slider is changed

Silent mode seems to be fake, nice

@glpnk
Copy link
Contributor

glpnk commented Jul 3, 2024

@Waujito can you compare fan curves for auto and silent mode on Windows?

You can make comparison easier by saving dump in RWe and loading it in comparison mode. Second click on compare button to switch view from dump to realtime values, and will highlight different values

If you need map of EC values to WMI, with it you can just ignore many addresses
image

And if you know ImHex you can check this pattern https://github.com/glpnk/hexpats/blob/main/msi-wmi2-dsdt.hexpat

@Waujito
Copy link
Contributor Author

Waujito commented Jul 3, 2024

You can check is curve applied by making EC dumps under load and comparing temperature and RPM % values with the curve.

I compare sound from my coolers -_-
Also it seems like address before the beginning of curve (0x71) indicates which speed is used now from curve. It is not a real value, just an indicator. It works only when curve is enabled.

Or you can set idle RPM % (first slider) to more than 0 value and disable fan curve mode

Do you mean that 0x72 controls something more than just least possible fan speed?

Silent mode seems to be fake, nice

For me it works great (I just tried eco mode in linux) It caps my cpu on 1.1GHz and takes fans control out of curve. 0x71 seems frozen when this mode is enabled.

@glpnk
Copy link
Contributor

glpnk commented Jul 3, 2024

Or you can set idle RPM % (first slider) to more than 0 value and disable fan curve mode

Do you mean that 0x72 controls something more than just least possible fan speed?

No. Basically, you can set 0x72 to any non-zero value and read the same value from 0x71 (if CPU is cold enough) or a different value which might equal one of the next addresses 0x72-78.

Few models have Basic fan mode, but it's hard to tell how it should work.

Silent mode seems to be fake, nice

For me it works great (I just tried eco mode in linux) It caps my cpu on 1.1GHz and takes fans control out of curve. 0x71 seems frozen when this mode is enabled.

Re tested, and on Eco shift:

  • Silent fan + 0x72 = 70%
  • Auto fan + 0x72 = 70%

Sounds different.

So, Silent works, but still use fan curve

Also it seems like address before the beginning of curve (0x71) indicates which speed is used now from curve. It is not a real value, just an indicator. It works only when curve is enabled.

True


Different shifts changes CPU power limit on some AMD devices

@glpnk
Copy link
Contributor

glpnk commented Jul 3, 2024

Re-re tested and silent works on other shifts too

@Waujito
Copy link
Contributor Author

Waujito commented Jul 3, 2024

Oh eco and silent are not the same thing. Eco seems like a super battery. Silent is a fan mode...

@Waujito can you compare fan curves for auto and silent mode on Windows?

The only thing that changes is 0x0d -> 0x1d in 0xd4. Just like the driver does. And yes I think it sounds different too

@glpnk
Copy link
Contributor

glpnk commented Jul 3, 2024

Which app you laptop use? MSI has at least 4 apps + 1 deprecated.

Shift/User scenario use combination of "shift" 0xd2/0xf2 (depends on laptop generation, we call it WMI2/1) and fan settings

@Waujito
Copy link
Contributor Author

Waujito commented Jul 3, 2024

Which app you laptop use?

Shift/User scenario use combination of "shift" 0xd2/0xf2 (depends on laptop generation, we call it WMI2/1) and fan settings

MSI Center from MS Store as it was written on the drivers page.
It writes basically to 0xD_ row

@glpnk
Copy link
Contributor

glpnk commented Jul 3, 2024

Got AMD STAPM (CPU power limit) values by combo of shift and fan:

eco 10-12
silent + comfort 14
auto + comfort 24
sport 25

@Waujito
Copy link
Contributor Author

Waujito commented Jul 3, 2024

But anyway I cannot understand whats going on in auto mode when I set 0x72 to 00. My cpu fan just full stops independently of temperature. But I also cant say it is a static value. In auto mode fan increases the speed.

Whats the black magic behind that default 26 2b 30 36 3c 46 55 or 38 43 48 54 60 70 85 in decimal... It also doesn't seem to depend only on 0x72...

But in advanced mode everything works as it should

@glpnk
Copy link
Contributor

glpnk commented Jul 3, 2024

Temp RPM %
0x69 0x72
0x6A 0x73
... ...
0x70 0x78

@glpnk
Copy link
Contributor

glpnk commented Jul 3, 2024

0x69 = 0, 0x6a = 48; so when CPU temp is between this, 0x72 speed value is used

Maybe custom fan curve just breaks auto mode

@Waujito
Copy link
Contributor Author

Waujito commented Jul 4, 2024

Yes, its just broken. 0x72 may act as some kind of fraction. Also temperature may be shifted. I think only we can do is to backup custom curve somewhere (when it will be implemented) and override it with default curve and backwards on every fan mode switch right like it is implemented in MSI Software.

@glpnk
Copy link
Contributor

glpnk commented Jul 4, 2024

Fan curve is not implemented yet, another question - is backups of fan curve is task for driver or userspace app

@Waujito
Copy link
Contributor Author

Waujito commented Jul 4, 2024

Hm, If so, Isn't a good option is just to delete the custom fan curve, lock its file (return error on read-write in fan mode auto) and force the user to write it when they want to enable the custom curve?

@glpnk
Copy link
Contributor

glpnk commented Jul 4, 2024

No, because other device might have properly implemented fan modes

@Waujito
Copy link
Contributor Author

Waujito commented Jul 5, 2024

So we can reset and lock it for only specific devices and instruct userspace app to check it out after advanced mode enabled.
Or we can abstract this out and change curve virtually, store it in memory and push to EC when advanced mode is enabled for devices like mine.
Another question is how much devices is affected by this issue and how to detect it?

Btw it seems like my laptop works well now with msi-ec.
Battery threshold is ok (but I needed to discharge my battery to 60% for it to start work properly)
Leds are likely fine too, but audio-mute one doesn't indicate muted state (not a module issue, related to pipewire I think).
CPU fan speed is fixed by PR as well as keyboard lights(It turned off when driver was reloaded/exited, not related to a specific device).

@glpnk Thank you so much for your time. It was amazing to work with you! This project is really cool, thank you!

@TGODiamond
Copy link
Contributor

Running echo 'Master Playback Switch' | sudo tee /sys/class/sound/ctl-led/speaker/card1/attach makes my audio-mute led work, only for it to stop working again after reboot. Yeah, it is definitely not a bug with the driver.

@glpnk
Copy link
Contributor

glpnk commented Jul 6, 2024

@TGODiamond wow, thanks, where you find this?

@TGODiamond
Copy link
Contributor

I found out the signal isn't attached automatically, so i attached the signal manually to the led. Found it out by myself :)

From amixer events command, you can find out the events.

@TGODiamond
Copy link
Contributor

I don't know where to write a bug report of this, so help is appreciated!

@teackot
Copy link
Collaborator

teackot commented Nov 2, 2024

Merged! #139

@teackot teackot closed this as completed Nov 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
New firmware Request for a new firmware
Projects
None yet
Development

No branches or pull requests

4 participants