-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] abstract clock subsystem: counter syntonization #60400
Conversation
@cfriedt relevant to you in the context of POSIX clocks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No real objections to the typedef, but it seems a little nonorthogonal? Right now kernel time is in ticks, because a "tick" is what kernel time is measured in by definition. Conversion of ticks to ns (both numerically and via the existing timeout mechanism) is already provided for anyone who needs it, and the utilities to do so are uniformly 64 bit (there are 32 bit converters for smaller units, but not ns).
Having a separate "kind" of time seems more confusing than helpful to me, honestly. There aren't any APIs that use this yet. What's the end goal here?
@andyross - I would almost suggest a 1:1 with Florian. I had a few broader picture discussions with him about some hi-res timer / clock things and was sold on it. Aside, those old tickets might actually get closed soon! |
Sure. There's just a long graveyard of attempts in this space, and I'd much rather we see an implementation/prototype arrive before making any changes to the public API. In particular an attempt to tie kernel timekeeping to a real unit is something we spilled a lot of blood trying to get away from, given that so many of our platforms use counters that don't convert cleanly into decimal units. |
@andyross : 100% agreed - the onus is on me to prove that it makes sense to decouple certain uses of time from existing kernel concepts (ticks/cycles) and that ktime_t is the right candidate to do so. I agree with @cfriedt that a F2F would be helpful. In any case I provide some more thoughts in writing - depending on your communication preference no need to read the following fine print, though, if you prefer F2F anyway...
The new type is not meant to represent kernel time "on the fly" but to represent time in structures and variables independently of the underlying clock source which more often than not will not be the kernel cycle counter.
That's correct, rounding from/to the ktime_t domain must be dealt with. Clock sources will have to provide precise cycle values, similarly to what the kernel timer does (and the kernel timer is the default clock source). Rounding results in +/- 0.5 ns jitter or alarm errors but error does not accumulate over time.
@cfriedt , @andyross : Do you agree that the waiting PR justifies ktime_t at least for the network subsystem? Would you prefer that we "hide" ktime_t somewhere in the network subsystem until we provide PRs in the POSIX/libc/TSCH area using that type in a broader sense? IMO the ktime_t concept has been very useful to refactor time across the network subsystem (including but not limited to the IEEE 802.15.4 subsystem):
The ultimate end goal is defined in an RFC. As soon as this PR is accepted the waiting PR will be added to the code base immediately introducing ktime_t as network time. But that alone would not justify introducing ktime_t in such an exposed place. We all need to be 100% convinced that at least the following features make sense in Zephyr in the short to mid term, should be decoupled from kernel time and use ktime_t-based intermediate clock sources instead:
These use cases have in common:
Next steps that follow immediately will be:
|
@kartben The build suddenly fails with "API_DEFINE: do not specify a non-Zephyr API for libc File:include/zephyr/sys/ktime.h". Any idea what that means? The only thing that changed is the comments. |
@keith-packard would know :) |
Whoa. Looks like that new check is busted. PR #60551 should help here. |
So... I won't stand in the way here as a simple typedef isn't a big deal. I'll just say I remain deeply suspicious that this is going to help and not hurt. Real hardware on real devices doesn't work with abstracted fine grained timekeeping due to the aliasing issues with e.g. 32768 Hz clocks. Real apps need to know what the tick rate is and tune their timer math to that using APIs that expose the conversions explicitly. Real libraries and subsystems[1] need to be aware of this limitation and not just paper it over. This is a pretty API. The hardware isn't pretty. The API is making promises the hardware can't keep. While, again, it's just a typedef, my suspicion is that creeping/contagious use of this kind of abstraction (vs. k_timeout_t, which requires explicit conversion at creation time) is going to take us right back to the world of constant annoying bad-timer-assumption bugs on nRF et. al. [1] In particular, the network layer is in fact a pretty bad citizen here precisely because it's not tick-aware! Now... it never will be? tl;dr: No -1 from me. But... I really think this needs to wait on some major subsystem work using it to land to see what the impact is going to be. |
I wonder if it's possible to isolate things to a testsuite or sample application first, get some of these concepts upstream that way without it being public API, and then create a separate PR that essentially flips a switch (moves the headers into public API and demonstrates the MVP use case). That way, it's still a 2-way door. A testsuite would kind of be ideal because we could test things on various pieces of hw, with e.g. a 32k timer. |
@andyross I think you're right. We need a deeper understanding of how this type is going to be embedded in a larger subsystem based on real code. I have the same concerns re aliasing and abuse of pseudo-precise nanosecond values. These problems need to be addressed right away - not later on. I propose I write more code and draw some pictures, add it all into this PR and then we continue the discussion.
@cfriedt Yes, this goes in the direction I proposed above when I said that we should probably hide this typedef in the network subsystem while it's only used there. I believe it is provably useful there - it's a big step forward, even if it never makes its way out of the network subsystem. Before we use the concept in POSIX clocks or libc, @andyross 's concerns need to be addressed anyway. UPDATE: Done - see #60401.
@andyross Do you propose to base TX/RX timestamps, timed TX and RX windows on opaque counter cycle values? I actually considered this, but so far I assumed that this would be a bad idea:
Therefore I currently assume that using nanoseconds + timespec as the common denominator for network APIs is the right approach even in the long term. But I might be blind here because I read these standards all day - so what is your opinion about this? It's important because it influences the clock subsystem design quite a bit. |
I have watched other projects adopt a general 64-bit nanosecond time mechanism with good results; it avoids numerous errors where scaling is performed incorrectly or lost entirely. Plus it encourages the use of 64-bit types instead of various kludges to try and fit times into 32-bit values. As long as the model exposes the resolution of every time source, it seems like the resulting programming model is generally easier to use correctly. |
For clarity here: I'm not saying an API that exposes fine-grained monotonic time is a bad idea. I'm not saying it doesn't belong in Zephyr somewhere. I'm saying it's a poor fit for our hardware and that it almost certainly doesn't belong in the kernel as the "official" way we keep time. Again, we spent years dealing with test code that would naively compute timeouts in milliseconds in an obvious way and then fall over mysteriously on Nordic because something happened too early or took too long. And the solution was to say "No, don't give me an integer. Instantiate a timeout once and the system will ensure it's an opaque struct so you don't shoot your foot trying to do more math on it." If we could assume all systems had a MHz-scale timekeeping device, my answer would be different. But... we can't. |
To be blunt: if you're basing it on something other than ticks, regardless of what your API says, you probably have bugs somewhere. The result of |
Prepare for instantiable system clock by concentrating its state in a struct. Signed-off-by: Florian Grandel <[email protected]>
Minimally increase stack size on the kernel pipe test as the additional indirection required by the introduction of a pluggable kernel timer API seems to exceed the required stack size just by a few bytes. Also fixes a test isolation issue: One of the tests didn't prepare it's own precondition which made it impossible to execute in isolation. Signed-off-by: Florian Grandel <[email protected]>
Adds the elapsed() and set_timeout() operations to the timeout api and decouples all calls to these operations from the system clock. This prepares for future pluggability of these operation. Signed-off-by: Florian Grandel <[email protected]>
Decouples the activation of the system clock from the activation of the generic timeout API so that the latter can be enabled independently from the former. Signed-off-by: Florian Grandel <[email protected]>
Exposes the newly created multi-instance timeout API via timeout_q.h header file. Signed-off-by: Florian Grandel <[email protected]>
The recent refactoring of time conversion functions to macros left a few references orphaned in the docs. Signed-off-by: Florian Grandel <[email protected]>
Now that "clocks" can mean system clocks or abstract clocks, the "clock" concept has become ambivalent. Also there is an inconsistency between system clock header files (that use the sys_ prefix) and docs (that do not use the prefix). Both, the ambivalence and the inconsistency, are being fixed by prefixing the system clock doc group with "sys". The change also renames "timeutil_unit_apis" which is also inconsistent and not really readable anyway. Signed-off-by: Florian Grandel <[email protected]>
Updates the existing net time docs to refer to the recently introduced timepoint concept to further clarify what is meant by tick counter values. Signed-off-by: Florian Grandel <[email protected]>
The header files `zephyr/sys/time_units.h` and `zephyr/sys_clock.h` had both, generic time conversion and system clock specific utilities. This change creates a header file that will provide generic utilities to represent and convert time to be used independently from the system clock. The newly created `zephyr/sys_time.h` header must not depend on any other header files except for low level generic type declarations, the toolchain and low level utilities so that it can be included from `time_units.h` and `sys_clock.h` without include loops. The header is named `zephyr/sys_time.h` to avoid confusion with the POSIX `sys/time.h` header. The chosen name is consistent with existing sys_* headers. Signed-off-by: Florian Grandel <[email protected]>
Fixes a minor bug that prevented doxygen from correctly recognizing the function to which the doc comment was attached. Signed-off-by: Florian Grandel <[email protected]>
Moves generic time to timeout conversion utils to the general time header. Signed-off-by: Florian Grandel <[email protected]>
Moves generic time converter generator macros to the general time header. Signed-off-by: Florian Grandel <[email protected]>
Slightly improves generic time converter macro documentation. Signed-off-by: Florian Grandel <[email protected]>
The TIME_CONSTEXPR definition was obsoleted by d8b276e and can be removed now. Signed-off-by: Florian Grandel <[email protected]>
Improve readability of inline documentation by re-ordering related comments. Signed-off-by: Florian Grandel <[email protected]>
Moves public generic time constants that are not specific to the system clock into the generic time header. Signed-off-by: Florian Grandel <[email protected]>
Moves private generic time constants that are not specific to the system clock into the generic time header. Signed-off-by: Florian Grandel <[email protected]>
Moves all time point related utilities to the generic time utilities as these are not specifically bound to the system clock. Signed-off-by: Florian Grandel <[email protected]>
Now that the content of the sys_clock header is reduced to what is really specific to the system clock, the condition macros and documentation structure can be simplified and cleaned up. Signed-off-by: Florian Grandel <[email protected]>
Introduces definitions for special timepoints corresponding to a timepoint that can "never" occur and a "zero" timepoint. Signed-off-by: Florian Grandel <[email protected]>
Introduces a dependency to a specific timout API into the timer structure. This allows multiplexed timers to be defined against arbitrary tick counter sources. Also fixes a minor locking issue that would allow concurrent read/write access on timeout nodes due to insufficient locking. Signed-off-by: Florian Grandel <[email protected]>
Introduces the API for a syntonized network uptime reference including the underlying hybrid (sleep/high-resolution) counter. Also prepares the IEEE 802.15.4 driver API as a client of this API. Signed-off-by: Florian Grandel <[email protected]>
Optionally introduces the network uptime API into the IEEE 802.15.4 driver API. Signed-off-by: Florian Grandel <[email protected]>
Padding section alignment for 64bit-aligned timeout struct. Signed-off-by: Florian Grandel <[email protected]>
This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time. |
Proposes a well-defined intermediate concept of scalar syntonized nanosecond resolution time with overflow protection above low-level counters/cycles/ticks and below higher level time abstractions (timescales, calenders, etc.). Details see the discussion below.
Currently this happens in the net subsystem but in principle this can and may be promoted to something more generic as it is an implementation of layers 1 (counter) and 2 (syntonization) of the proposed clock subsystem RFC which is itself relevant to the POSIX roadmap.
The infrastructure discussed in this RFC will first be used for TSCH TDMA.