Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel: Add a patch to fix an NPD when using the UART and a stuck irq #134

Merged
merged 1 commit into from
Apr 22, 2024

Conversation

ukleinek
Copy link

When /dev/ttySTM1 is closed while there are still some characters pending to be sent, the kernel dereferences a null pointer which locks up the kernel.

Fix that by picking a patch from today's linux-next.

While the commit log claims this indeed fixes the problem noticed by @SmithChart, this isn't tested yet.

@hnez hnez self-requested a review April 10, 2024 09:05
@hnez hnez self-assigned this Apr 10, 2024
@hnez
Copy link
Member

hnez commented Apr 10, 2024

This seems to solve part of the problem. The good thing is that the kernel errors seen when closing a /dev/ttySTM* while there is still pending data is gone and now there is only a:

$ dmesg -w
…
[   68.147733] stm32-usart 4000f000.serial: Transmission is not complete

in its place.

But it gets worse when one tries to use the device again:

$ dmesg -w
…
[   78.434633] panel-mipi-dbi-spi spi2.0: SPI transfer timed out
[   78.439387] spi_master spi2: failed to transfer one message from queue
[   78.450284] spi_master spi2: noqueue transfer failed
[   78.454254] panel-mipi-dbi-spi spi2.0: error -110 when sending command 0x2a
[   78.461289] lmp92064 spi1.0: SPI transfer timed out
…

followed by a watchdog induced reboot.

To me this looks like e.g. lower-priority interrupts no longer being handled.

@SmithChart
Copy link
Member

followed by a watchdog induced reboot.
To me this looks like e.g. lower-priority interrupts no longer being handled.

What @hnez reports still sounds like the behavior I've observed.

@ukleinek ukleinek closed this Apr 17, 2024
@ukleinek ukleinek force-pushed the fix-kernel-lockup branch from 6608071 to 4385047 Compare April 17, 2024 10:19
@ukleinek ukleinek reopened this Apr 17, 2024
@ukleinek
Copy link
Author

Closed by mistake, I think by accidentally pushing a wrong rev to my branch. Anyhow, now it should be right and fixed.

@ukleinek ukleinek changed the title Draft: kernel: Add a patch to fix an NPD when using the UART kernel: Add a patch to fix an NPD when using the UART Apr 18, 2024
@ukleinek ukleinek changed the title kernel: Add a patch to fix an NPD when using the UART kernel: Add a patch to fix an NPD when using the UART and a stuck irq Apr 19, 2024
@hnez
Copy link
Member

hnez commented Apr 19, 2024

I've tested this PR now and it looks like it fixes the issues we had. Nice to see this endeavour come to an end!

Before we merge it however #136 should be merged first and this PR be rebased so that the version number consistency check does not throw errors.

@SmithChart
Copy link
Member

#136 has been merged. Please rebase.

@ukleinek ukleinek force-pushed the fix-kernel-lockup branch from 3961c07 to 5d4fc9a Compare April 22, 2024 07:17
@ukleinek
Copy link
Author

I don't know what the version consistency check is about, but I rebased anyhow.

When /dev/ttySTM1 is closed while there are still some characters
pending to be sent, the kernel dereferences a null pointer which locks
up the kernel.

This is fixed by two patches cherry-picked from next.

Another problem is that if the UART is closed while being throttled,
the irq handler thinks it's still throttled after reopen which results
in it not handling RX events resulting in a stuck irq starving the
machine. This is fixed by two further patches. The first only improves
the driver's behaviour generally on stuck irq, the second is the actual
fix.

Signed-off-by: Uwe Kleine-König <[email protected]>
@ukleinek ukleinek force-pushed the fix-kernel-lockup branch from 5d4fc9a to bc2c396 Compare April 22, 2024 07:19
@ukleinek
Copy link
Author

(Pushed once more because I got the committer info wrong in the first force-push)

@hnez
Copy link
Member

hnez commented Apr 22, 2024

I don't know what the version consistency check is about, but I rebased anyhow.

It makes sure that we do not generate any images containing e.g. 24.04 as a version number after the 24.04 version tag is already set and instead have 24.04+dev as a version in meta-lxatac-software/conf/distro/tacos.conf.

PS: I've accidentially pressed "Close with comment" instead of "Cancel". When I decided that I did not want to add this comment. As a result I've added the comment in a half-ready state and closed the PR, both of which I did not mean to do. Oh my.

@hnez hnez closed this Apr 22, 2024
@hnez hnez reopened this Apr 22, 2024
@hnez hnez merged commit a99a241 into linux-automation:nanbield Apr 22, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants