Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix watchdog termination #341

Merged
merged 23 commits into from
Feb 13, 2024
Merged

Fix watchdog termination #341

merged 23 commits into from
Feb 13, 2024

Conversation

erlingrj
Copy link
Collaborator

@erlingrj erlingrj commented Jan 28, 2024

This PR address this issue. It also moves the watchdogs to the environment struct and should be merged with this PR in lingua franca.

This PR does a couple of things:

  • Solves race-condition with the watchdog threads on termination.
  • Creates a single, non-terminating thread per watchdog instead of terminating at each timeout
  • Registers the watchdog trigger at a logical tag equal to the expiration.
  • Brings watchdog support to the Zephyr target, for this to work we must know, compile time, of the number of watchdogs so we can pre-allocate a stack for the watchdog threads.

@erlingrj erlingrj marked this pull request as draft January 28, 2024 20:57
@erlingrj
Copy link
Collaborator Author

Zephyr fails because additional "user threads" must be specified. We have previously used compile defs for this. We might consider adding a LF_NUM_WATCHDOGS. A problem is however that the zephyr implementation does not work well with dynamically creating and destroying threads. Since I did add a condition variable we could probably make it work without terminating the watchdog thread at each timeout

@edwardalee
Copy link
Contributor

I think the idea with watchdogs was that on embedded platforms they would be implemented using a timer rather than thread. This will require an addition to the platform.h abstraction, I think, where the default implementation uses a thread. But with Zephyr, it seems like maybe there is a way to implement it with a timer?

Zephyr fails because additional "user threads" must be specified. We have previously used compile defs for this. We might consider adding a LF_NUM_WATCHDOGS. A problem is however that the zephyr implementation does not work well with dynamically creating and destroying threads. Since I did add a condition variable we could probably make it work without terminating the watchdog thread at each timeout

@erlingrj
Copy link
Collaborator Author

I remember we discussed an algorithm to implement watchdogs without threads, with an ISR. For Zephyr, I think using a thread is OK.

Copy link
Contributor

@edwardalee edwardalee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are very useful changes. This should fix the long-standing bug but also properly puts the watchdog timer in the environment. However, I'm not sure, but I think there might be a flaw that could cause watchdogs to be randomly cancelled. Should be an easy fix, though. See detailed comments.

core/threaded/watchdog.c Outdated Show resolved Hide resolved
core/threaded/watchdog.c Outdated Show resolved Hide resolved
core/threaded/watchdog.c Outdated Show resolved Hide resolved
core/threaded/watchdog.c Outdated Show resolved Hide resolved
core/threaded/watchdog.c Outdated Show resolved Hide resolved
core/threaded/watchdog.c Outdated Show resolved Hide resolved
core/threaded/watchdog.c Outdated Show resolved Hide resolved
core/threaded/watchdog.c Outdated Show resolved Hide resolved
core/threaded/watchdog.c Outdated Show resolved Hide resolved
@erlingrj
Copy link
Collaborator Author

Thanks Edward. You are right about the spurious wakeups of lf_cond_timedwait. I also just remembered the issue with lf_cond_timedwait needing an absolute wakeup-time wrt CLOCK_REALTIME, however lf_time_physical is wrt CLOCK_MONOTONIC. It is related to our discussion on #289. It means that before going to sleep on a cond_var we need to check the offset between CLOCK_MONOTONIC and CLOCK_REALTIME and adjust for it.

We would make it easier for ourselves if we didn't try to solving the clock synchronization problem within the LF runtime and instead just used CLOCK_REALTIME and require that the underlying system is configured correctly

@erlingrj
Copy link
Collaborator Author

erlingrj commented Feb 4, 2024

In order to properly implement the watchdog termination I need to first get #346 merged.

Copy link
Collaborator Author

@erlingrj erlingrj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Edward

core/threaded/watchdog.c Outdated Show resolved Hide resolved
@erlingrj erlingrj marked this pull request as ready for review February 13, 2024 09:10
@erlingrj erlingrj requested a review from edwardalee February 13, 2024 09:10
Copy link
Contributor

@edwardalee edwardalee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I've made some code clarity suggestions. Also, I'm not sure about the active field. I inserted a question.

core/threaded/watchdog.c Outdated Show resolved Hide resolved
core/threaded/watchdog.c Outdated Show resolved Hide resolved
core/threaded/watchdog.c Outdated Show resolved Hide resolved
core/threaded/watchdog.c Outdated Show resolved Hide resolved
core/threaded/watchdog.c Show resolved Hide resolved
include/core/threaded/watchdog.h Outdated Show resolved Hide resolved
@erlingrj
Copy link
Collaborator Author

@edwardalee thanks for the suggestions improving the readability of the main loop of the watchdog thread. I replied to a comment on the active field somewhere. The active field is only written to by the watchdog thread and it indicates whether it has stopped/timed-out or whether it is actively waiting for a time-out. If it is the former, then a call to lf_watchdog_start must also signal the cond-var to wake it up from the inactive state.

If this makes sense, lets get this merged

@edwardalee edwardalee merged commit 18fd6ee into main Feb 13, 2024
13 of 28 checks passed
@edwardalee edwardalee deleted the watchdog branch February 13, 2024 18:35
@edwardalee
Copy link
Contributor

Merged this and queued the corresponding merge in lingua-franca.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants