Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutliple timers, wall-clock time, Monotonic ratio, and Mode Changes #23

Open
perlindgren opened this issue Nov 6, 2019 · 1 comment

Comments

@perlindgren
Copy link
Contributor

While the title touches on several topics they are somewhat connected/interdependent.
In the future we may want to support multiple timers (with separate timer queues). This has the advantage to reduce priority inversion, allow different pre-scalers to accommodate the trade off between range and accuracy, and reduce worst case OH for time queue operations.

However, this raises the question/need of a common notion of time (wall-clock). This would allow give a common notion to time offsets when spawning tasks in the future (all relating to the same time base, even if queued and dispatched by different timers).

This also relates to the problem of ratio between a timer and the timing reference (be that the system clock or the wall-clock). If we allow either the system-clock or the timer pre-scaler or clock tree to change, we are inducing a mode change in the system.

This implies a problem to the timing semantics (its unclear at what point in time a queued (outstanding) event should be dispatched).

So it boils down to defining a suitable timing semantics, (one could consider wall-clock time as being a reference). For practical reasons, we cannot expect to have exact wall-clock unless relying on an RTC, and that might not be fine grained (enough, and also might be costly to access), so an approximate wall-clock should do in most cases. Network time might be something that certain applications may need, but that is likely best handled at the application level.

So what about mode changes in the system. What are the general goals, challenges and possible solutions.

A mode represents an operation condition (RTFM now have two modes, init and run). In the setting of an IoT application, we might have a low-power mode (where the system awaits some stimuli), and one (or several) active modes. In the low-power mode, we might have access only to a limited set of timers/timer queues (e.g., an RTC based queue for wakening the system each 24h to poll some sensor), when woken by the RTC we can have another set of timers/queues for performing the measurement(s), and perhaps there is a third mode triggered by a radio wakeup, where we have a set of timers/queues for the management of radio communication etc. Notice here, that each mode (or state of operation), may well have a unique set of tasks and timing requirements. Task (and resources) may be shared in between modes (e.g., a task to read some sensor, may be scheduled under several modes, e.g. an behalf of the RTC wakeup, or by a request from the radio).

Take another use case, a control application. Under normal operation, we have a set of tasks, resources and constraints, e.g., we have access to sensor x,y,z. What if sensor z is faulty (e.g. the wire to z is broken), then we go into a limp-home mode, where we have access to only sensors x and y. In this mode the resource z is not available, and we still should be possible to make a best effort control. What if also x and y are broken, then we need to do an emergency stop (as the system is no longer possible to control, stability may not be possible). Emergency stop, should then be a third mode of operation.

One can think of zillions of such different use cases. Can RTFM support them? Well currently RTFM supports them only at application level. It is currently up to the implementer to "encode" the mode of operation, resources can be masked behind Option<T> etc. Its very generic, but also very weak regarding guarantees.

Alternatively, RTFM could be extended to nativly capture modes (e.g., similarly to the multi-core extension, where we have different task sets/resources, we could define different modes). Code reuse of can still be done, using the tasks only as trampolines to shared code.

A fundamental problem here is mode changes, what semantics do we want. What about outstanding messages, e.g., requesting a transition back to a low power state (for the IoT case), should likely be graceful to outstanding messages (they should be allowed to complete). In the case that we have detected a faulty z sensor, we need to transition to the limp-home state. In that case we want to carry over the resources to the new mode (besides the z resource that is broken). In the case that we need an emergency stop, the mode change may need to be immediate, but still we may need to finish of already started tasks, before the mode change takes place.

As seen the semantics wanted/required may vary from case to case, thus it may be hard to foresee all possible requirements. Hopefully, the mode change semantics can pinned down to a set of features. E.g., criteria for when mode changes should be taken (regarding outstanding messages and started tasks). Criteria for how new messages (emitted on behalf of already started tasks, vs. scheduled but not started tasks). Assumptions on resources/states, during and after transition. E.g., resuming after deep sleep may render all resources void, essentially a cold start if memory is not retained, while going into limp-home mode can carry over all resources. Once semantics have been pinned down, we can think about syntax, for transitions, how should carried over resources be marked etc. Can we think of hybrid transitions, where outstanding messages may execute under the new mode (e.g., logging messages may be allowed to carry over and complete under the new mode...).

Why would we want this kind of support from RTFM. To put it simple, static guarantees and robustness.

In the context of control applications, fault mode handling is very tricky to get right by hand.
Similarly in the context of IoT, where we want to exploit power modes aggressively it takes a lot to ensure robustness.

In both cases, a well designed framework that allows for correct by construction design could be extremely helpful. While its a grand challenge to come up with the framework, it would allow the programmer to fearlessly design applications, where based on the task/resource/mode change model, the framework would reject unsound models (e.g., preventing the use of data from a faulty sensor, or relying on some data in memory that would have been erased due non-retained memory during sleep).

@perlindgren
Copy link
Contributor Author

Regarding mode changes:

RTFM follows the procedure of cortex-m/rt initializing the memory before calling into init.
This mechanism ensures that the system is in a well defined state when entering init, and happens on cold boot/reset.

However, we might think of handling errors (caused by panic! and NMIs) through alternative boot/reset behaviour, with a different set of partially overlapping tasks, and/or resources.

Overlapping resources, could be marked as persistent or initialized. This would allow us to carry over state from previous mode into the new mode. Persistent resources need some caution, if the mode transition is due a panic! or NMI, as the error could have occurred half way through a non-atomic resource update. In any case, that is the information we have at hand, and all we can do is to try to deal with it intelligently.

Implementation:
We need a stable way to determine reset cause, to select alternative mode to boot into (could be, e.g., init/normal, on_panic, on_wakeup, on_nmi, on_watchdog). There is some hw support but it may be vendor dependent.

Ideally you should be able to come up with your own modes as well.

We need a stable way to generate mode changes. (Just jumping to the reset vector is not enough I assume.) There might be some hardware specifics involved.

Important here is that mode changes should have well defined semantics and be amenable/in reach for static analysis. This way we may bind the response time for events triggering mode changes, even if the analysis need to take both the reception process and the reset process into account. This is especially important if we want to bind response time for wake_up events and critical errors in control systems.

An example of soft reset can look like this:
https://mcuoneclipse.com/2015/07/01/how-to-reset-an-arm-cortex-m-with-software/

HW specifics could be abstracted through traits implemented by some (RTFM) HAL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant