Ztest "1cpu" cases don't retarget interrupts on x86_64 #21216
Labels
area: SMP
Symmetric multiprocessing
area: Tests
Issues related to a particular existing or missing test
area: X86_64
x86-64 Architecture (64-bit)
bug
The issue is a bug, or the PR is fixing a bug
priority: low
Low impact/importance bug
Stale
On SMP, many tests weren't designed to work with multiple CPUs and are making use of a "1cpu" ztest variant. The way this work is fairly crude: it spawns a thread which locks interrupts and spins, forcing the test to operate on only one CPU.
But with x86_64 in particular, with the default IO-APIC destination settings for an interrupt ("fixed" delivery to the "lowest priority" "physical" CPU), the HPET timer interrupt sometimes gets directed to the locked CPU. Obviously this doesn't get handled, and the test will fail (usually hanging).
This has ping-ponged in the source tree. The original (single cpu) targetting was replaced in commit 5a9a33b with a "logical" delivery to a CPU/APIC ID of 0xff, which (in qemu at least) works to broadcast the interrupt to all CPUs. But this failed on UP2 hardware and got reverted in commit 005aff7, accidentally introducing the bug detailed here, and had to be re-reverted in 23bddde.
This is fairly rare in practice, just one test fails with notable frequency. But really this needs some kind of architectural solution, I can see two good ones:
Augment the arch layer on SMP platforms with a "mask and disable delivery of interrupts" API that can be used to disable interrupts for long term tasks like this.
Deprecate the "1cpu" ztest feature and make all test cases SMP-safe by design. This is a ton of work, and some tests make some hard assumptions about preemption behavior (e.g. it's really not possible to get kernel/sched/preempt to work at all without knowing that everything happens on one CPU).
The text was updated successfully, but these errors were encountered: