Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k_wakeup follwed by k_thread_resume call causes system freeze #28694

Closed
rhardik opened this issue Sep 25, 2020 · 4 comments · Fixed by #29299
Closed

k_wakeup follwed by k_thread_resume call causes system freeze #28694

rhardik opened this issue Sep 25, 2020 · 4 comments · Fixed by #29299
Assignees
Labels
bug The issue is a bug, or the PR is fixing a bug platform: nRF Nordic nRFx priority: medium Medium impact/importance bug

Comments

@rhardik
Copy link

rhardik commented Sep 25, 2020

Describe the bug
I've added a while loop with 8 seconds sleep in a thread. Now by mistake I called both k_wakeup and k_thread_resume which is causing system to get freeze without any crash logs.

To Reproduce
Steps to reproduce the behaviour:

  1. $ git clone -b thread-handling-issue https://github.com/rhardik/reproduce-zephyr-issues.git
    2.$ cd reproduce-zephyr-issues
  2. copy issue-1 folder to zephyr compilation environment
  3. $ cd issue-1 && west build -b <board_name> (I'm using nRF52 boards)
  4. west flash
  5. press button-0 to suspend thread.
  6. press button-1 to resume thread (here I've used both k_wakeup and k_thread_resume)
  7. See at UART terminal: system gets freeze
  8. Now edit the code. Enable k_thread_suspend(dummy_tid)

while (1) { k_sleep(K_MSEC(8000)); printk("flag = %d\n",dummy_flg); printk("%s\n", __func__); //k_thread_suspend(dummy_tid); }
10. Recompile and flash the code.
11. Press button-0 to see if suspend thread works or not.
12. Press button-1 to resume the thread.
13. See at UART terminal: while loop continuously running without considering k_sleep(8000)

Expected behavior
System should not get freeze if I call k_thread_resume after k_wakeup (Note: thread in sleep state here)

Impact
annoyance.

Logs and console output
Screenshot from 2020-09-25 13-21-00

Environment (please complete the following information):

  • OS: Linux)
  • Toolchain - Zephyr SDK
  • Commit a3fae2f (zephyr v2.2.99)
@rhardik rhardik added the bug The issue is a bug, or the PR is fixing a bug label Sep 25, 2020
@carlescufi carlescufi added the platform: nRF Nordic nRFx label Sep 25, 2020
@andyross
Copy link
Contributor

We don't generally do defensive state checking in kernel APIs. Indeed, resuming an unsuspended thread or waking up a non-sleeping one is a fatal error, and exactly the behavior you get is essentially undefined.

It's no different than trying to do things like double-remove a list item, etc...

I suppose someone could add some assertions in there to check state.

@pabigot
Copy link
Collaborator

pabigot commented Sep 25, 2020

I confirm that k_wakeup and k_thread_resume do not play well together, and do not seem to be no-ops when the thing they're supposed to do is unnecessary.

https://github.com/pabigot/rhardik-reproduce-zephyr-issues/commits/issue/28694 has three commits, the first updating the example to current Zephyr APIs, and the second making the test a little nicer to work with, and the third refining the distinction between the behaviors. Below is the summary of the functional changes and observed behavior.

Shorten the delay and print a running timestamp in the dummy thread.

Simplify compile-time selection of whether the dummy thread
self-suspends.

Use a counter to iterate through four responses to SW1 press:

  • only k_wakeup
  • only k_thread_resume
  • both k_wakeup and k_thread_resume
  • neither k_wakeup nor k_thread_resume

If the dummy thread does not self-suspend then k_wakeup alone, and k_thread_resume alone, both work. When both are invoked, the program hangs. This appears to be a bug.

If the dummy thread does self-suspend then

  • SW1 press 1 k_wakeup wakes from sleep
  • SW1 press 2 k_thread_resume unsuspends the thread
  • SW1 press 3 k_wakeup and k_thread_resume wakes and releases the dummy thread and it fast-loops, neither sleeping nor suspending. This appears to be a bug.

@pabigot pabigot added the priority: medium Medium impact/importance bug label Sep 25, 2020
@andrewboie
Copy link
Contributor

These are syscalls so even if we are lax about checking correct usage from supervisor mode contexts, a nefarious user thread shouldn't be able to cause corruption or DoS via the syscalls, at minimum needs to be 100% robust from user threads (and very likely worth making robust from all contexts). I'd say Medium priority is appropriate.

@nashif
Copy link
Member

nashif commented Oct 17, 2020

Indeed, resuming an unsuspended thread or waking up a non-sleeping one is a fatal error, and exactly the behavior you get is essentially undefined.

that is not true per the documentation:

/**
 * @brief Wake up a sleeping thread.
 *
 * This routine prematurely wakes up @a thread from sleeping.
 *
 * If @a thread is not currently sleeping, the routine has no effect.
 *
 * @param thread ID of thread to wake.
 *
 * @return N/A
 */
__syscall void k_wakeup(k_tid_t thread);

and

/**
 * @brief Resume a suspended thread.
 *
 * This routine allows the kernel scheduler to make @a thread the current
 * thread, when it is next eligible for that role.
 *
 * If @a thread is not currently suspended, the routine has no effect.
 *
 * @param thread ID of thread to resume.
 *
 * @return N/A
 */
__syscall void k_thread_resume(k_tid_t thread);

Both calls should just have no effect, it should not hang or crash.

nashif added a commit to nashif/zephyr that referenced this issue Oct 21, 2020
Do not try to resume a thread that was not suspended.

Fixes zephyrproject-rtos#28694

Signed-off-by: Anas Nashif <[email protected]>
nashif added a commit that referenced this issue Oct 22, 2020
Do not try to resume a thread that was not suspended.

Fixes #28694

Signed-off-by: Anas Nashif <[email protected]>
sjg20 pushed a commit to sjg20/zephyr that referenced this issue Oct 23, 2020
Do not try to resume a thread that was not suspended.

Fixes zephyrproject-rtos#28694

Signed-off-by: Anas Nashif <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The issue is a bug, or the PR is fixing a bug platform: nRF Nordic nRFx priority: medium Medium impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants