Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watchdog Example not working as expected on a Nordic chip #29991

Closed
caco3 opened this issue Nov 13, 2020 · 6 comments · Fixed by #31925
Closed

Watchdog Example not working as expected on a Nordic chip #29991

caco3 opened this issue Nov 13, 2020 · 6 comments · Fixed by #31925
Assignees
Labels
area: Watchdog Watchdog bug The issue is a bug, or the PR is fixing a bug platform: nRF Nordic nRFx priority: low Low impact/importance bug

Comments

@caco3
Copy link
Contributor

caco3 commented Nov 13, 2020

In the Watchdog Example the example shows that it is possible to trigger a callback on a watchdog trigger before the hardwired watchdog action. The callback re-feeds the watchdog and prints a whole line to the log.

At least on a Nordic nrf52 chip, this does not work as shown in the example!

Expected Outcome:
The log shows the line "Handled things..ready to reset" before the device gets reset

Seen Outcome:
Only the first 1 character is shown: A▒*** Booting Zephyr OS build zephyr-v2.4.0 ***
(Note: Character 3 and later are already the next startup)

https://devzone.nordicsemi.com/f/nordic-q-a/38926/wdt-and-error-handler states the following:

And when checking if the watchdog callback function is called, remember that this function is called two cycles of the 32 kHz clock before the reset is done, so it is a very little time to do something.

This leads me to the conclusion, that the Watchdog Example (at least on a Nordic nRF52) does not work as expected!

Suggestions:

  1. Add a note to the example indicating that this does not behave as expected on all devices
  2. IMO the line static bool handled_event; should be changed to static bool handled_event = false; to make sure it does not have a random value!
@caco3 caco3 added the bug The issue is a bug, or the PR is fixing a bug label Nov 13, 2020
@caco3
Copy link
Contributor Author

caco3 commented Nov 13, 2020

If you can confirm my conclusion, I am happy to write a patch for it.

@nashif nashif added the priority: low Low impact/importance bug label Nov 17, 2020
@pabigot
Copy link
Collaborator

pabigot commented Nov 17, 2020

If you can confirm my conclusion, I am happy to write a patch for it.

@caco3 Please do. There are multiple parts of the watchdog API that aren't entirely cross-vendor capable, so documenting it would be good. But:

static bool handled_event; should be changed to static bool handled_event = false; to make sure it does not have a random value!

No, we don't want that. C defines that static variables, like file-scope variables, are default-initialized to all zeros, so the value will be false when first encountered. An explicit assignment, if not optimized out because it has no effect, would waste flash memory for storing the initial value then writing it to RAM the first time the function is invoked.

@katsuster
Copy link
Member

I faced same problem on SiFive HiFive1 rev.b board.

No one has enough time to callback wdt_callback() because WDT_FLAG_RESET_SOC request
hardwired reset from watchdog. WDT will asserts reset signal immediately when it reached timeout.

I think this sample (and also tests/drivers/watchdog) is a little strange.
If sample want to show callback mechanism from Zephyr WDT framework,
this sample should use WDT_FLAG_NO_RESET instead of WDT_FLAG_RESET_SOC.

So in my opinion, shall we split sample into 2 parts:

  • part 1: using WDT_FLAG_NORESET with callback
  • part 2: using WDT_FLAG_RESET_SOC with hardwired reset

I will be appreciated any comments from you. Thanks.

@pabigot
Copy link
Collaborator

pabigot commented Jan 26, 2021

Copied from slack for posterity:

  • pabigot Topic for 2021-01-26: Watchdog Example not working as expected on a Nordic chip #29991 notes that Nordic allows scheduling a callback on a RESET_SOC watchdog configuration, but in practice this isn't useful as you get 61.2 us for the callback to complete its action before the reset occurs. I suspect other SoCs are also failing to reject this configuration when they should, causing the test to fail. I'd like to refactor the test so callback is tested using NO_RESET, but want to understand whether removing the callback-before-reset feature in Nordic is an acceptable solution. It would be a breaking behavioral change if anybody actually uses this.
  • jyates I'm not sure where to comment on this, so I'll do it here. I currently use the callback feature to store state before the reboot occurs. Primarily the current civil time so we can restore it on reboot, but also the thread we were in when the watchdog fired. As noted, 60uS isn't much time, so we don't bother trying to call LOG_PANIC , our function is basically 6 variable assignments.
  • pabigot Fooie. OK, will have to figure out another solution.

So I don't think this change is appropriate and the sample needs to be fixed in a different way.

@mnkp
Copy link
Member

mnkp commented Jan 26, 2021

The current implementation of watchdog sample application is incorrect and violates original design of watchdog API. It is not possible to reliably support scenario when watchdog timeout triggers a callback and SoC reset at the same time. The watchdog API was not designed to be used this way.

The scenario described by jyates where watchdog timeout triggers a callback followed by SoC reset can only be supported in a reliable manner by a multistage watchdog. This requires dedicated hardware support and can be enabled by CONFIG_WDT_MULTISTAGE Kconfig option. In this case the first stage should be configured to trigger a callback, and the second stage to trigger SoC reset.

Unfortunately, none of the existing drivers seems to implement support for multistage watchdog.

I propose to fix the sample application and improve watchdog API specification.

@pabigot
Copy link
Collaborator

pabigot commented Feb 3, 2021

PR #31925 addresses the reported issue and would close this. However, there are a variety of additional problems with that sample, and with several drivers, that mandate further attention early in 2.6, including:

  • Both sifive and gecko should return -ENOTSUP on the attempt to configure a callback on RESET_SOC since they don't support that configuration, and the documentation says that unsupported configurations should be diagnosed.
  • I think same70 should support the callback, but I'm not seeing it executed.

Those issues as well as the concerns in #29991 (comment) should be tracked in an enhancement issue for a review of the watchdog API and implementation.

@pabigot pabigot modified the milestone: v2.5.0 Feb 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Watchdog Watchdog bug The issue is a bug, or the PR is fixing a bug platform: nRF Nordic nRFx priority: low Low impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants