Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cortexm_common: disable IRQ during thread_sched_idle #14534

Merged
merged 1 commit into from
Jul 16, 2020

Conversation

bergzand
Copy link
Member

@bergzand bergzand commented Jul 16, 2020

Contribution description

A race condition is present where an IRQ is serviced between the
priority increase of the PENDSV and the sleep. When the IRQ
is serviced before the WFI sleep, the core will sleep until the next
IRQ and the thread activated by the IRQ will not be scheduled until
a new IRQ triggers.

This commit wraps an IRQ disable and restore around the priority
modification and sleep to prevent interrupts from being serviced until
the WFI call returns.

Testing procedure

Issues/PRs references

Fixes #14521
See #14224 (comment)

A race condition is present where an IRQ is serviced between the
priority increase of the PENDSV and the sleep. When the IRQ
is serviced before the WFI sleep, the core will sleep until the next
IRQ and the thread activated by the IRQ will not be scheduled until
a new IRQ triggers.

This commit wraps an IRQ disable and restore around the priority
modification and sleep to prevent interrupts from being serviced until
the WFI call returns.
@bergzand bergzand added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Platform: ARM Platform: This PR/issue effects ARM-based platforms labels Jul 16, 2020
@miri64 miri64 added Area: cpu Area: CPU/MCU ports Process: needs backport Integration Process: The PR is required to be backported to a release or feature branch labels Jul 16, 2020
@miri64 miri64 added this to the Release 2020.07 milestone Jul 16, 2020
@miri64
Copy link
Member

miri64 commented Jul 16, 2020

Confirmed, that this fixes #14534.

@fjmolinas
Copy link
Contributor

This fixes the issue mentioned in #14224 (comment) as well.

Copy link
Contributor

@fjmolinas fjmolinas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not happy, and @bergzand mentioned in offline as well about all the nested irq_disable/enable we are having, but this is the best fix ATM. So ACK!

@fjmolinas fjmolinas added CI: run tests If set, CI server will run tests on hardware for the labeled PR CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Jul 16, 2020
@miri64
Copy link
Member

miri64 commented Jul 16, 2020

Failing tests are the usual suspects. I think the nrf52dk worker is broken again :-/

    failed:
    tests/riotboot_hdr/nrf52dk:gnu
    tests/memarray/nrf52dk:gnu
    tests/pkg_heatshrink/nrf52dk:llvm
    tests/float/nrf52dk:gnu
    tests/bloom_bytes/nrf52dk:gnu
    tests/pkg_ucglib/nrf52dk:gnu
    tests/bench_runtime_coreapis/nrf52dk:llvm
    tests/l2util/nrf52dk:llvm
    tests/gnrc_netif/nrf52dk:gnu
    tests/pkg_relic/nrf52dk:gnu
    tests/pkg_minmea/nrf52dk:llvm
    tests/progress_bar/nrf52dk:gnu
    tests/gnrc_sock_ip/nrf52dk:llvm
    tests/pkg_c25519/nrf52dk:llvm
    tests/bitarithm_timings/nrf52dk:llvm
    tests/cpp_exclude/nrf52dk:llvm
    tests/pkg_u8g2/nrf52dk:llvm
    tests/bench_xtimer_load/nrf52dk:llvm
    tests/heap_cmd/nrf52dk:llvm
    tests/pkg_tinycrypt/nrf52dk:gnu
    tests/rmutex/nrf52dk:llvm
    tests/gnrc_sixlowpan_iphc_w_vrb/nrf52dk:llvm
    tests/periph_hwrng/nrf52dk:llvm
    tests/pthread/nrf52dk:llvm
    tests/thread_msg_seq/nrf52dk:gnu
    tests/evtimer_underflow/nrf52dk:gnu
    tests/driver_bme680/nrf52dk:gnu
    tests/driver_bme680/nrf52dk:llvm
    tests/lwip_sock_tcp/nrf52dk:gnu
    tests/riotboot/nrf52dk:gnu
    tests/pkg_micro-ecc-with-hwrng/nrf52dk:gnu
    tests/driver_my9221/nrf52dk:gnu
    tests/posix_time/nrf52dk:llvm
    tests/msg_try_receive/nrf52dk:gnu
    tests/thread_basic/nrf52dk:llvm
    tests/thread_msg_bus/nrf52dk:gnu
    tests/evtimer_msg/nrf52dk:gnu
    tests/trace/nrf52dk:llvm
    tests/driver_hd44780/nrf52dk:gnu
    tests/cb_mux_bench/nrf52dk:llvm
    tests/build_system_cflags_spaces/nrf52dk:gnu
    tests/pkg_umorse/nrf52dk:gnu
    tests/bench_xtimer/nrf52dk:gnu
    tests/gnrc_rpl_p2p/nrf52dk:gnu
    tests/pkg_cayenne-lpp/nrf52dk:llvm
    tests/mutex_order/nrf52dk:llvm
    tests/pkg_littlefs/nrf52dk:llvm
    tests/gnrc_sock_async_event/nrf52dk:llvm
    tests/pkg_monocypher/nrf52dk:llvm
    tests/phydat_dump/nrf52dk:llvm
    tests/fmt_print/nrf52dk:gnu
    tests/lwip_sock_ip/nrf52dk:gnu
    tests/mpu_noexec_ram/nrf52dk:gnu
    tests/pkg_libfixmath/nrf52dk:llvm
    tests/cb_mux/nrf52dk:gnu
    tests/pkg_ubasic/nrf52dk:llvm
    tests/blob/nrf52dk:llvm
    tests/pkg_libcoap/nrf52dk:llvm
    tests/log_color/nrf52dk:gnu
    tests/thread_cooperation/nrf52dk:llvm
    tests/xtimer_reset/nrf52dk:llvm
    tests/bench_thread_flags_pingpong/nrf52dk:llvm
    tests/xtimer_usleep_short/nrf52dk:gnu
    tests/bench_sched_nop/nrf52dk:gnu
    tests/cpp11_thread/nrf52dk:gnu
    tests/ztimer_msg/nrf52dk:llvm
    tests/shell/nrf52dk:gnu
    tests/xtimer_now64_continuity/nrf52dk:gnu
    tests/gnrc_ipv6_fwd_w_sub/nrf52dk:llvm
    tests/xtimer_hang/nrf52dk:gnu
    tests/thread_flood/nrf52dk:llvm
    tests/thread_zombie/nrf52dk:llvm
    tests/buttons/nrf52dk:gnu
    tests/bench_sizeof_coretypes/nrf52dk:gnu
    tests/bench_sizeof_coretypes/nrf52dk:llvm

@miri64 miri64 added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed CI: run tests If set, CI server will run tests on hardware for the labeled PR CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Jul 16, 2020
@miri64 miri64 merged commit ea8e867 into RIOT-OS:master Jul 16, 2020
@miri64
Copy link
Member

miri64 commented Jul 16, 2020

@bergzand please provide a backport.

@benpicco
Copy link
Contributor

benpicco commented Jul 16, 2020

Isn't the root cause of that problem that RIOT will configure all interrupts with the same priority?
e.g. cortexm_init.c all interrupts are configured to CPU_DEFAULT_IRQ_PRIO.
PendSV_IRQn is set to CPU_CORTEXM_PENDSV_IRQ_PRIO, but this also defaults to CPU_DEFAULT_IRQ_PRIO.

So instead of disabling interrupts, why not just give a higher priority to the PendSV interrupt to begin with?

@bergzand
Copy link
Member Author

So instead of disabling interrupts, why not just give a higher priority to the PendSV interrupt to begin with?

That's exactly what is described here.

/**
* @brief Define a separate priority for the PendSV interrupt
*
* According to the ARM Cortex-M documentation, the recommended best
* practice is to place the PendSV at the lowest interrupt priority.
* By default, RIOT runs PendSV on the same interrupt priority with all
* other interrupts.
*
* For efficency (or other reasons), one may want to run the PendSV as
* the last one, just before returning to the (next) thread. However,
* since PendSV triggers the RIOT scheduler _without_ interrupts being
* disabled, any interrupts that pre-empt the scheduler, including the
* timer interrupts, must not call anything that may affect the
* scheduler, such as mutex or scheduler functions. With the current
* design of RIOT, writing interrupt handlers in such a manner is not
* exactly trivial.
*
* An experimental way to to run PendSV as the last thing before
* returning to the user thread context is to enable Cortex-M
* sub-priorities with @ref CPU_CORTEXM_PRIORITY_GROUPING and then
* make the PendSV interrupt sub-priority lower than the default.
* (Remember, on Cortex-M lower urgency means higher priority number.)
*
* For now, by default, we preserve the traditional RIOT behaviour, but
* allow specific CPUs, boards, or apps to change this.
*
* See cpu/cortexm_common/cortexm_init.c how these are used.
*
* If you want to set this, define it in your `cpu_conf.h`.
*/
#ifndef CPU_CORTEXM_PENDSV_IRQ_PRIO
#define CPU_CORTEXM_PENDSV_IRQ_PRIO (CPU_DEFAULT_IRQ_PRIO)
#endif
/** @} */

It can probably be done in the future, but I don't think the current scheduler routines are exactly thread safe (as already shown in this PR).

@bergzand bergzand deleted the pr/cortexm/irq_during_idle branch July 16, 2020 18:33
@bergzand
Copy link
Member Author

See also https://developer.arm.com/documentation/ka001284/1-0/?search=5eec6e71e24a5e02d07b259a for some background reading on this:

When interrupts are disabled by use of PRIMASK (CPSID i), the processor will wake up from WFI when it receives an interrupt whose priority would preempt the current execution context if PRIMASK were not set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: cpu Area: CPU/MCU ports CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Platform: ARM Platform: This PR/issue effects ARM-based platforms Process: needs backport Integration Process: The PR is required to be backported to a release or feature branch Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tests/test_tools: fails on iotlab-m3 due to core/cpu regression
4 participants