scheduler: threads using k_sleep can be _swap()'d back too early #8128

mike-scott · 2018-06-03T21:53:49Z

Scheduler uses a function "should_preempt()" to determine if the _current thread should be _swap()'d away from. Part of that calculation uses a function _is_thread_prevented_from_running() which evaluates if the _current thread is "runnable".

Currently, the k_sleep() API doesn't make a thread look like it isn't runnable other than the following expression being true:
thread->base.timeout.delta_ticks_from_prev != _INACTIVE.

Originally, I felt that _is_thread_ready() would be a better replacement for _is_thread_prevented_from_running() as this takes into account active timeouts and would handle the k_sleep() case. But, a patch which changed this behavior is causing problems with the scheduler tests, so more research is needed.

mike-scott · 2018-06-03T21:56:01Z

@nashif @andyross I'm going to try and document every "issue" concerning the scheduler over the next day so we can see all of the various symptoms still in play. From there we can de-dupe as items are debugged.

This is issue #1. I attempted a patch:
#8126

But it's causing other issues, currently.

mike-scott · 2018-06-04T00:58:20Z

Found the bug in #8126 and I'm running my own personal soak test with 13 IoT devices to see how well they perform.

mike-scott · 2018-06-04T01:02:23Z

@andyross I ran a local sanitycheck with #8126 and it's passing now as well.

andyross · 2018-06-04T04:34:53Z

So.... looking at this from the context of trying to make it happen in a test, I'm flipping back. I thought I had a test case for this (about to submit) that showed the failure, but that was a bad assert. k_sleep() actually works correctly right now, though the logic is distressingly subtle:

Sleep does not flag the thread PENDING or otherwise unrunable, but it does call _remove_thread_from_ready_q()
_r_t_f_r_q() then does the removal, and (because _current is the thread being removed) calls update_cache() with preempt_ok as TRUE, indicating that the cache should be updated to reflect whatever next thread was found even if the current thread is otherwise nonpreemptible.
update_cache() then calls next_up() to retrieve the next thread, which because _current was removed is guaranteed to be something else (an idle thread if nothing else is available).
Then it calls should_preempt() to see if it should replace _current with whatever it found, which will be 1 because preempt_ok is a short-circuit case there.
Then, having updated the cache, we return back into k_sleep() and call _Swap(), guaranteeing we stop running and switch to the cache thrad. And because we are no longer in the run queue, we are guaranteed to

Basically, I don't see how this can be the bug. The k_sleep() state management works now for, effecitvely, the same reason it worked before the change. I agree we want to clean it up (ideally to make PENDING and "sleeping" the same state), but I don't see a logic failure.

All that said, I'm leaving the +1 in place and agree we should merge this. The change is correct logically, easily reviewed, and I'm willing to buy that it fixes some other yet-more-subtle state somewhere else, given that it seems to be resolving your problems.

andyross · 2018-06-04T04:51:25Z

FWIW: if you want to catch the presumptive bug closer to the act, one suggestion would be to enable assertions (if possible, otherwise just a manual test + log/abort/whatever) in should_preempt() that looks like:

__ASSERT_NO_MSG(_is_thread_prevented_from_running(_current) != _is_thread_ready(_current));

I'd be very curious to see the results. A pretty careful re-read and grep of every path I can see to _add_timeout() shows that it's never possible for a thread to have a timeout in a case where it gets inspected as the top of the run queue.

mike-scott · 2018-06-04T05:36:13Z

@andyross I agree with your assessment that k_sleep does manage to sort of work initially.

However, with patch applied, I've had no oops or hangs in the 5 hours of testing on several devices. Previously, I would have seen at least 1 device go offline by now.

I'll setup an assert to see if I can catch the path that is causing errors.

pfalcon · 2018-06-04T17:55:03Z

I'll setup an assert to see if I can catch the path that is causing errors.

Not sure if this should be reopened then.

mike-scott · 2018-06-04T18:03:20Z

@pfalcon Hard to say.

The reality is that 6c95daf uses the "correct" check which makes sure (even in some back water path) that the thread we are moving to is in fact ready.

It has also been running on several of my devices for the past 18-ish hours without resulting in a hang or kernel oops which was previously showing up after an hour or so.

@andyross has agreed that there is more work to do in the scheduler with regards to cleanup and the first patch he is submitting in v1.13 will move k_sleep towards using the same pend/unpend methods that the other APIs use so it won't be a snowflake like it is now.

Should we keep a general purpose "Cleanup the scheduler and look for concurrency bugs" issue open? Maybe? But I think this issue is a bit too specific for that.

pfalcon · 2018-06-04T18:22:07Z

Sounds good, I hope news about the assert above won't be lost/forgotten then ;-) (even if the news are "didn't get to it").

mike-scott · 2018-06-05T19:20:17Z

So the k_sleep() issue is real.

Using this diff on master branch: https://hastebin.com/ihoyazofad.diff

I generate this log while performing an OTA update using Zephyr sample: https://hastebin.com/uxojakapul.log

Here's the backtrace from gdb: https://hastebin.com/bifoneripu.pas

(gdb) bt
#0  k_cpu_idle () at /home/scottml/osf/zmp-osf/zephyr/arch/arm/core/cpu_idle.S:135
#1  0x0000cdd0 in _SysFatalErrorHandler (reason=<optimized out>, pEsf=<optimized out>)
    at /home/scottml/osf/zmp-osf/zephyr/arch/arm/core/sys_fatal_error_handler.c:70
#2  0x0000c94e in _oops () at /home/scottml/osf/zmp-osf/zephyr/arch/arm/core/swap_helper.S:345
#3  <signal handler called>
#4  should_preempt (th=<optimized out>, preempt_ok=0) at /home/scottml/osf/zmp-osf/zephyr/kernel/sched.c:129
#5  update_cache (preempt_ok=preempt_ok@entry=0) at /home/scottml/osf/zmp-osf/zephyr/kernel/sched.c:215
#6  0x0002cc78 in _add_thread_to_ready_q (thread=0x200043f8 <prio_recv_thread_data>)
    at /home/scottml/osf/zmp-osf/zephyr/kernel/sched.c:237
#7  0x0002cf90 in _ready_thread (thread=0x200043f8 <prio_recv_thread_data>)
    at /home/scottml/osf/zmp-osf/zephyr/kernel/include/ksched.h:240
#8  do_sem_give (sem=0x2000a118 <sem_prio_recv>) at /home/scottml/osf/zmp-osf/zephyr/kernel/sem.c:102
#9  _impl_k_sem_give (sem=0x2000a118 <sem_prio_recv>) at /home/scottml/osf/zmp-osf/zephyr/kernel/sem.c:137
#10 0x0001af80 in isr_rx_conn (trx_done=<optimized out>, rssi_ready=<optimized out>, crc_ok=1 '\001')
    at /home/scottml/osf/zmp-osf/zephyr/subsys/bluetooth/controller/ll_sw/ctrl.c:3695
#11 isr_radio_state_rx (rssi_ready=<optimized out>, irkmatch_id=<optimized out>, irkmatch_ok=<optimized out>, 
    devmatch_id=<optimized out>, devmatch_ok=<optimized out>, crc_ok=1 '\001', trx_done=<optimized out>)
    at /home/scottml/osf/zmp-osf/zephyr/subsys/bluetooth/controller/ll_sw/ctrl.c:3850
#12 isr () at /home/scottml/osf/zmp-osf/zephyr/subsys/bluetooth/controller/ll_sw/ctrl.c:4393
#13 0x0001c9fc in radio_nrf5_isr_body () at /home/scottml/osf/zmp-osf/zephyr/subsys/bluetooth/controller/ll_sw/ll.c:83
#14 radio_nrf5_isr () at /home/scottml/osf/zmp-osf/zephyr/subsys/bluetooth/controller/ll_sw/ll.c:81
#15 <signal handler called>
#16 __pendsv () at /home/scottml/osf/zmp-osf/zephyr/arch/arm/core/swap_helper.S:104
#17 <signal handler called>
#18 0x00000006 in ?? ()
---Type <return> to continue, or q <return> to quit---
#19 0x000000fe in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

mike-scott · 2018-06-05T19:20:31Z

@pfalcon @andyross ^^

andyross · 2018-06-05T19:23:38Z

Cool, thanks. Will rework for 1.13 and bug you to retest on the rig when it's ready.

mike-scott · 2018-06-05T19:35:49Z

Also here's the _kernel object: https://hastebin.com/ducusuxata.xml

This shows:

_current = 0x200027a0 <engine_thread_data> which is in LwM2M client using k_sleep(500)
timeout_q also shows <engine_thread_data> & <net_neighbor_pool> in waiting status
ready_q.cache is set to <_idle_thread_s>
the only thread in the runq is <prio_recv_thread_data>

Backtrace shows:

ISR woke up bluetooth controller
eventually do_sem_give() was called for <sem_prio_recv>
in do_sem_give():
1. we called _ready_thread(<prio_recv_thread_data>)
2. which calls _add_thread_to_ready_q(<prio_recv_thread_data>)
3. which calls update_cache(0)
4. which calls should_preempt()

Discussion points:
Q: Why the _current thread is still <engine_thread_data>?
A: Is it possible we had the ISR happen right in the middle of the k_sleep call after <engine_thread_data> was removed from the runq?

Q: Why is the readyq.cache showing <_idle_thread_s>?

This in the backtrace looks odd:

#15 <signal handler called>
#16 __pendsv () at /home/scottml/osf/zmp-osf/zephyr/arch/arm/core/swap_helper.S:104
#17 <signal handler called>

Were we in the middle of another __pendsv? when the ISR happened?

mike-scott · 2018-06-05T19:38:38Z

@andyross I don't think the scheduler is working correctly (still). The k_sleep() patch merely cleaned up the logic in should_preempt. But, TBH the above circumstances should never happen.

andyross · 2018-06-05T19:39:50Z

Scheduling decisions should be atomic. I spent way too much time staring at the synchronization over the last too days and found nothing. But it's always possible to miss something. If you're wanting to hunt yourself, MOST of the synchronization is in the LOCKED() blocks and can be easily verified to be a sane critical section. The scarier bits are in the timeout management code which use traditional irq_lock().

(Now I know you're shouting "Ah hah!" here, since it's timeout code we're specifically looking at. And I agree. Except I couldn't see anything wrong. Nonetheless a cleanup is going to move the timeout stuff under the scheduler spinlock where it belongs, so a fix it at least plausible)

And surely a PendSV exception is not itself interruptible, no? I can't imagine how that would work if it was.

mike-scott · 2018-06-05T19:44:16Z

AH nm, the issue with readyq.cache pointing to idle. We're already panicked and I'm looking at current state, not what it was prior to k_panic. I'll add a logging point to print readyq.cache before panic.

mike-scott · 2018-06-05T21:54:05Z

@andyross as a follow-up. The above stack is completely legitimate as the k_sleep() thread was the _current thread when the IRQ hit.

I think the underlying problem is that the LwM2M engine service thread has too high of a priority for what it's doing. And it can't be preempted by most other threads.

mike-scott · 2018-06-06T00:58:15Z

@andyross I think we have a problem with k_sleep() still. It has to do with this change:
5b37cd7#diff-7f89e4cbbdcbb89c82dd2cbb00343602

It assumes that all calls to __swap() go through PendSV. We know that k_sleep() doesn't pend() the thread like other paths. This means we're in an "interesting" state when we hit line 104 "msr BASEPRI, r0" in swap_helper.S doesn't it?

mike-scott · 2018-06-06T01:21:44Z

Actually the bug here (if there is one) is that _current isn't update till 2 lines AFTER the msr BASEPRI, r0 call. Meaning that an IRQ hitting once interrupts are re-enabled will trigger all of the ready_thread code and use the OLD _current thread value for comparison skipping the new_thread entirely.

mike-scott · 2018-06-06T01:23:54Z

Hrm.. that's not true either.. we shouldn't be getting an interrupt at line 104 at all.

We don't turn back on interrupts till line 163: msr BASEPRI, r0

mike-scott · 2018-06-06T03:15:20Z

@cvinayak for reference

andyross · 2018-06-28T16:56:40Z

Drop priority as a workaround was commited for 1.12. The root cause (PendSV can be interrupted before entry, making _Swap() inherently non-atomic on ARM!) needs to be documented in a separate bug and a ARM-only workaround put in place for 1.13.

mike-scott · 2018-08-16T21:40:48Z

@andyross I think this can be closed. Correct?

nashif · 2018-08-17T14:43:38Z

yes, lets close it

andyross · 2018-08-17T16:00:19Z

Yes, this is worked around. I really should write up a bug on the root cause issue (_Swap() on ARM is nonatomic because of the way pendsv works), and the code should cleanly reflect that and maybe implement it in an ARM-specific way instead of checking scheduler pending states needlessly. Someone please nag me to do that.

mike-scott assigned andyross Jun 3, 2018

mike-scott added the area: Kernel label Jun 3, 2018

mike-scott added this to the v1.12.0 milestone Jun 3, 2018

mike-scott mentioned this issue Jun 4, 2018

kernel: sched: use _is_thread_ready() in should_preempt() #8126

Merged

andyross mentioned this issue Jun 4, 2018

tests/kernel/preempt: Add yield and sleep cases #8133

Merged

nashif closed this as completed in #8126 Jun 4, 2018

andyross mentioned this issue Jun 4, 2018

kernel/queue: Fix spurious NULL exit condition when using timeouts #8161

Merged

pfalcon changed the title ~~scheduler: threads using k_sleep can be _swap()'d back to early~~ scheduler: threads using k_sleep can be _swap()'d back too early Jun 4, 2018

mike-scott reopened this Jun 5, 2018

andyross modified the milestones: v1.12.0, v1.13.0 Jun 5, 2018

andyross added the priority: high High impact/importance bug label Jun 5, 2018

nashif added the bug The issue is a bug, or the PR is fixing a bug label Jun 5, 2018

andyross added priority: medium Medium impact/importance bug and removed priority: high High impact/importance bug labels Jun 28, 2018

nashif closed this as completed Aug 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scheduler: threads using k_sleep can be _swap()'d back too early #8128

scheduler: threads using k_sleep can be _swap()'d back too early #8128

mike-scott commented Jun 3, 2018 •

edited

Loading

mike-scott commented Jun 3, 2018

mike-scott commented Jun 4, 2018

mike-scott commented Jun 4, 2018

andyross commented Jun 4, 2018

andyross commented Jun 4, 2018

mike-scott commented Jun 4, 2018

pfalcon commented Jun 4, 2018

mike-scott commented Jun 4, 2018

pfalcon commented Jun 4, 2018

mike-scott commented Jun 5, 2018 •

edited by nashif

Loading

mike-scott commented Jun 5, 2018

andyross commented Jun 5, 2018

mike-scott commented Jun 5, 2018 •

edited by nashif

Loading

mike-scott commented Jun 5, 2018

andyross commented Jun 5, 2018

mike-scott commented Jun 5, 2018

mike-scott commented Jun 5, 2018

mike-scott commented Jun 6, 2018

mike-scott commented Jun 6, 2018 •

edited

Loading

mike-scott commented Jun 6, 2018

mike-scott commented Jun 6, 2018

andyross commented Jun 28, 2018

mike-scott commented Aug 16, 2018

nashif commented Aug 17, 2018

andyross commented Aug 17, 2018

scheduler: threads using k_sleep can be _swap()'d back too early #8128

scheduler: threads using k_sleep can be _swap()'d back too early #8128

Comments

mike-scott commented Jun 3, 2018 • edited Loading

mike-scott commented Jun 3, 2018

mike-scott commented Jun 4, 2018

mike-scott commented Jun 4, 2018

andyross commented Jun 4, 2018

andyross commented Jun 4, 2018

mike-scott commented Jun 4, 2018

pfalcon commented Jun 4, 2018

mike-scott commented Jun 4, 2018

pfalcon commented Jun 4, 2018

mike-scott commented Jun 5, 2018 • edited by nashif Loading

mike-scott commented Jun 5, 2018

andyross commented Jun 5, 2018

mike-scott commented Jun 5, 2018 • edited by nashif Loading

mike-scott commented Jun 5, 2018

andyross commented Jun 5, 2018

mike-scott commented Jun 5, 2018

mike-scott commented Jun 5, 2018

mike-scott commented Jun 6, 2018

mike-scott commented Jun 6, 2018 • edited Loading

mike-scott commented Jun 6, 2018

mike-scott commented Jun 6, 2018

andyross commented Jun 28, 2018

mike-scott commented Aug 16, 2018

nashif commented Aug 17, 2018

andyross commented Aug 17, 2018

mike-scott commented Jun 3, 2018 •

edited

Loading

mike-scott commented Jun 5, 2018 •

edited by nashif

Loading

mike-scott commented Jun 5, 2018 •

edited by nashif

Loading

mike-scott commented Jun 6, 2018 •

edited

Loading