-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
assert hit on PM_IDLE exit in subsys/pm/pm.c:133 #69807
Comments
Notable Kconfig settings used that affect the PM state transitions: CONFIG_ADSP_POWER_DOWN_HPSRAM=y |
Update to Zephyr in sof/main-rebase-20240305 branch of SOF project's clone of Zephyr upstream repository. Revert one Zephyr commit "pm: Remove CURRENT_CPU macro" that is leading to failed tests in SOF CI test suite. The revert allows us to update Zephyr to a newer version and tackle the SMP boot and cache interface changes in SOF. The latest Zephyr upstream has further changes needed in SOF for platform configuration and these will require separarate changes. Link: thesofproject#8818 Link: zephyrproject-rtos/zephyr#69807 Signed-off-by: Kai Vehmanen <[email protected]>
Update to Zephyr in sof/main-rebase-20240305 branch of SOF project's clone of Zephyr upstream repository. Revert one Zephyr commit "pm: Remove CURRENT_CPU macro" that is leading to failed tests in SOF CI test suite. The revert allows us to update Zephyr to a newer version and tackle the SMP boot and cache interface changes in SOF. The latest Zephyr upstream has further changes needed in SOF for platform configuration and these will require separarate changes. Link: #8818 Link: zephyrproject-rtos/zephyr#69807 Signed-off-by: Kai Vehmanen <[email protected]>
Still hit the error with Zephyr main of today (commit 0d5a670 ). Needed some HWMv2 work to make SOF build again. The issue does not happen on every run, so reproduction is not 100% but with enough repeats, I still get it. The crash looks similar, so interrupt state is unexpected when system resume is done:
|
This reverts commit b9d4b9d. System resume fails occasionally on Intel ACE platform to assert with this patch in place. Revert the change until a proper fix is found. Link: zephyrproject-rtos#69807 Signed-off-by: Kai Vehmanen <[email protected]>
@ceolin wrote:
In local test this at least affects the reproduction rate. I now scheduled a larger test plan with SOF CI infra at thesofproject/sof#8928 .. will update results here UPDATE: results seem good, there's one fail on MTL, but the DSP panics are gone: |
@ceolin It does seem the power gated state is not entered when the bug occurs and PR68493 does help . See comment in #68493 (comment) |
In case the core is not power gated, waiti will restore intlevel. In this case we lock interruption after it. In the bug scenario, the host starts streaming and via SOF APIs, keeps a lock to prevent Zephyr from entering PM_STATE_RUNTIME_IDLE. During the test case, host removes this block and core0 is allowed to enter IDLE state. When core0 enters power gated state, interrrupts are left enabled (so the core can be woken up when something happens). This leaves a race where suitably timed interrupt will actually block entry to power gated state and k_cpu_idle() in power_gate_entry() will return. This is rare, but happens often enough that the relatively short test plan run on SOF pull-requests will trigger this case. Fixes zephyrproject-rtos#69807 Signed-off-by: Flavio Ceolin <[email protected]> Signed-off-by: Anas Nashif <[email protected]>
In case the core is not power gated, waiti will restore intlevel. In this case we lock interruption after it. In the bug scenario, the host starts streaming and via SOF APIs, keeps a lock to prevent Zephyr from entering PM_STATE_RUNTIME_IDLE. During the test case, host removes this block and core0 is allowed to enter IDLE state. When core0 enters power gated state, interrrupts are left enabled (so the core can be woken up when something happens). This leaves a race where suitably timed interrupt will actually block entry to power gated state and k_cpu_idle() in power_gate_entry() will return. This is rare, but happens often enough that the relatively short test plan run on SOF pull-requests will trigger this case. Fixes #69807 Signed-off-by: Flavio Ceolin <[email protected]> Signed-off-by: Anas Nashif <[email protected]> (cherry picked from commit 07426a8)
Incredibly stupid, Github auto-close anti-feature: https://github.com/orgs/community/discussions/17308 The tentative fix is Issues should be closed when tests are passing again and never before. |
@marc-hb We do have a separate bug in SOF to track this issue thesofproject/sof#8908 , so I think we can close this bug on Zephyr as I've tested the fix. We can use the SOF bug to track integration of the fix into SOF (but that's outside Zephyr). Let's keep open for 24h as we complete more testing. |
Sorry I confused this with thesofproject/sof#8908 (the feature is still stupid, it didn't know you tested it) |
In case the core is not power gated, waiti will restore intlevel. In this case we lock interruption after it. In the bug scenario, the host starts streaming and via SOF APIs, keeps a lock to prevent Zephyr from entering PM_STATE_RUNTIME_IDLE. During the test case, host removes this block and core0 is allowed to enter IDLE state. When core0 enters power gated state, interrrupts are left enabled (so the core can be woken up when something happens). This leaves a race where suitably timed interrupt will actually block entry to power gated state and k_cpu_idle() in power_gate_entry() will return. This is rare, but happens often enough that the relatively short test plan run on SOF pull-requests will trigger this case. Fixes zephyrproject-rtos#69807 Signed-off-by: Flavio Ceolin <[email protected]> Signed-off-by: Anas Nashif <[email protected]>
In case the core is not power gated, waiti will restore intlevel. In this case we lock interruption after it. In the bug scenario, the host starts streaming and via SOF APIs, keeps a lock to prevent Zephyr from entering PM_STATE_RUNTIME_IDLE. During the test case, host removes this block and core0 is allowed to enter IDLE state. When core0 enters power gated state, interrrupts are left enabled (so the core can be woken up when something happens). This leaves a race where suitably timed interrupt will actually block entry to power gated state and k_cpu_idle() in power_gate_entry() will return. This is rare, but happens often enough that the relatively short test plan run on SOF pull-requests will trigger this case. Fixes #69807 Signed-off-by: Flavio Ceolin <[email protected]> Signed-off-by: Anas Nashif <[email protected]> (cherry picked from commit 07426a8)
Describe the bug
Starting with commit b9d4b9d ("pm: Remove CURRENT_CPU macro"), SOF public test have started failing with
ASSERTION FAIL [!z_smp_cpu_mobile()] @ ZEPHYR_BASE/subsys/pm/pm.c:133
[ 220.790855] os: print_fatal_exception: ** FATAL EXCEPTION
... the system is running active use-case on core0 and the assert happens when SOF releases the IDLE state lock with
"pm_policy_state_lock_put(PM_STATE_RUNTIME_IDLE, PM_ALL_SUBSTATES);" and allow Zephyr to enter RUNTIME_IDLE on core0.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Test passes.
Impact
Loss of audio on end-user system.
Logs and console output
https://sof-ci.01.org/sofpr/PR8901/build3107/devicetest/index.html?model=MTLP_RVP_NOCODEC&testcase=check-playback-10sec
See the "mtrace" tab to see Zephyr log.
Environment (please complete the following information):
Additional context
With the identified Zephyr commit revert, the same test passes reliably with no other failures seen in the test.
cc:
The text was updated successfully, but these errors were encountered: