Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move to CLOCK_REALTIME introduce clock.h and lf_atomic.h #346

Merged
merged 61 commits into from
Feb 11, 2024
Merged

Conversation

erlingrj
Copy link
Collaborator

@erlingrj erlingrj commented Feb 3, 2024

The main purpose of this PR is to move to pure CLOCK_REALTIME on UNIX systems as discussed here #289. It gained a little weight and now it includes:

  • Move everything to CLOCK_REALTIME
  • Remove the epoch_offset calculations which were needed for CLOCK_MONOTONIC
  • Handle adding/subtracting clock-sync offsets BELOW platform API instead of in tag.c This means that any time a platform should hand off a physical time, it must add the clock sync offset. And anytime it receives a physical time to sleep until it must subtract the clock-offset.
  • Moving the clock-sync dealings down there makes the implementation of wait_until a lot simpler. It enables using lf_cond_timedwait with just a normal instant_t without having to think about clock-sync offsets.
  • Make an own API for atomics. The macro stuff did not work because for GNU you could pass in any type, but for windows and for Zephyr++ you needed to pick a certain datatype. Now there is a separate API for 32bit data and 64bit data.
  • Make a common implementation of the atomics API for 32bit platforms using enabling/disabling interrupts. This requires having this implemented both for threaded and single-threaded platforms. (Previously it was just implemented for single-threaded)
  • Add -Werror (treat warnings as errors) to unit tests and RTI/. We should eventually add this to the core lib also.
  • Make NSEC(x) and friends return a interval_t not a long long. Now we can print NSEC(x) with PRINTF_TIME
  • Make lf_time_physical thread-safe and 32bit-safe using the atomics API.
  • Make updating the clock-sync offset 32-bit safe using the atomics API.

Older discussion:
After some consideration I decided to move the application of clock sync offset below the platform API. Previously this was done in tag.h, the layer above the platform API. The disadvantage was that we had an asymmetry. Because whenever you wanted to use lf_sleep_until or lf_cond_timedwait you had to remove the clock sync offset before calling it. Now the platform implementations has to apply the clock sync offset before handing off a timestamp, and it will also remove any clock_sync offset from absolute timestamps it receives for sleeping. It introduce some repetition but now the timestamps moving to/from the platform API are more consistent.

I also implemented a thread-safe and 32bit/64bit safe modification of the clock sync offset with a mutex. This is up for discussion. It makes it a lot costlier to read out the current time when we have clock-sync enabled.

There is still a FIXME with regards to how we best can make lf_time_physical monotonic and thread-safe on 32bit and 64bit systems. I think it is the same problem as the modification of the clock_sync offset, but more critical as it affects all systems.

Todos:

  • Fix docs in the code
  • Make lf_time_physical monotonicity enforcing thread-safe and 32bit-safe

@erlingrj erlingrj marked this pull request as draft February 3, 2024 21:01
@erlingrj erlingrj mentioned this pull request Feb 4, 2024
@erlingrj
Copy link
Collaborator Author

erlingrj commented Feb 4, 2024

@sberkun, I found the need to update the atomics implementations already now. For the 32bit platforms I think we have to do it by disabling interrupts. So I have moved lf_disable_interrupts_nested so it is defined for both threaded and single-threaded context. Will this work for the pico also? There was a comment indicating that core2 was disabled, what happens if core2 is the one making the call?

@erlingrj
Copy link
Collaborator Author

erlingrj commented Feb 4, 2024

@edwardalee @petervdonovan I am asking for your feedback on this. This got a little more involved than I had planned. I need a proper pass through the diff to add proper docs of the new functions etc. But would like to hear what you think about the proposed changes here.

@erlingrj erlingrj requested a review from edwardalee February 8, 2024 11:41
@erlingrj
Copy link
Collaborator Author

erlingrj commented Feb 8, 2024

@edwardalee I mark this as ready for review. I think a quite important and subtle point here is that this PR moves the responsibility of translating back and forth between the physical clock and the synchronized physical clock introduced by the clock synchronization that we support. It is now moved to the platform API implementation. This means that whenever you do _lf_clock_now (which is what lf_time_physical you get an instant_t that is corrected for clock synchronization. So all instant_ts in the runtime should be corrected for any clock sync. This means that the platform implementations must do two things:

  1. In _lf_clock_now: Apply any clock synchronization offset to the raw time read from their physical clock.
  2. Inlf_cond_timedwait, lf_sleep_until etc: Remove clock synchronization offsets if they need a raw absolute time to schedule a future wakeup.

This is a little subtle because some platforms do sleeping by calculating a sleep duration and then calling some lower-level API. In that case you should NOT subtract any clock sync offset since you will compute the duration relative _lf_clock_now which includes the clock-sync offset. So this does introduce some new challenges for implementing a new platform. And it is quite easy to make a mistake, I had made several mistakes related to applying/removing clock sync offsets between the raw physical clock and the instant_ts...

There is a way to avoid this, which is to add a layer between the platform API and the user-facing code. For _lf_clock_now we already do this, lf_time_physical is exactly this layer, and we could apply clock-sync offsets there. However for lf_cond_timedwait and lf_sleep_until we do not have such a layer. It would only affect the platform API which deals with time....

- Create clock.h, a layer between platform.h and the user/runtime
- Move applying/removing clock_sync offsets to clock.h
- Move monotonicity enforcement from tag.h to clock.h
Copy link
Contributor

@edwardalee edwardalee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks excellent to me. I've added a few small suggestions.

core/federated/clock-sync.c Outdated Show resolved Hide resolved
core/federated/clock-sync.c Outdated Show resolved Hide resolved
core/federated/clock-sync.c Outdated Show resolved Hide resolved
include/core/federated/clock-sync.h Outdated Show resolved Hide resolved
include/core/federated/clock-sync.h Outdated Show resolved Hide resolved
include/core/platform/lf_atomic.h Outdated Show resolved Hide resolved
include/core/platform/lf_atomic.h Outdated Show resolved Hide resolved
include/core/platform/lf_atomic.h Outdated Show resolved Hide resolved
include/core/platform/lf_atomic.h Outdated Show resolved Hide resolved
include/core/tag.h Outdated Show resolved Hide resolved
@erlingrj
Copy link
Collaborator Author

erlingrj commented Feb 8, 2024

Thanks for the thorough review @edwardalee. I am sorry that I have made some additional substantial changes which I need some approval for. As I discussed in the meeting I introduced now a layer between the timing APIs of platform.h and the rest of the runtime. I called it clock.h and it fetches physical clock readings from the platform API and adds/removes clock sync offsets and also ensures monotonicity. I also renamed from lf_clock_now to lf_clock_gettime

I think this is an improvement and makes thing more clear.

We can definitely support dropping in your own clock sync protocol at some point. For now I had to hide it with macros because clock-sync.h is tied with threaded platform and we cannot include it with an unthreaded platform. The user could also get his own clock sync by directly modifying the physical clock in an async thread also...

@erlingrj erlingrj requested a review from edwardalee February 8, 2024 20:09
@erlingrj erlingrj changed the title Move to CLOCK_REALTIME and move handling of clock sync offsets below the platform API. Move to CLOCK_REALTIME introduce clock.h and lf_atomic.h Feb 9, 2024
Copy link
Contributor

@edwardalee edwardalee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This turned into a pretty big effort! It looks great to me. I've appended a few more small suggestions.

core/clock.c Show resolved Hide resolved
include/core/clock.h Outdated Show resolved Hide resolved
include/core/clock.h Outdated Show resolved Hide resolved
include/core/clock.h Outdated Show resolved Hide resolved
include/core/clock.h Outdated Show resolved Hide resolved
include/core/platform.h Outdated Show resolved Hide resolved
include/core/platform.h Outdated Show resolved Hide resolved
include/core/platform.h Outdated Show resolved Hide resolved
include/core/platform.h Outdated Show resolved Hide resolved
core/tag.c Outdated Show resolved Hide resolved
@erlingrj erlingrj enabled auto-merge February 11, 2024 10:10
@edwardalee
Copy link
Contributor

I talked to Kevin Stanton, who recently retired from Intel after leading for many years their PTP efforts. He told me that some Intel platforms are already enforcing monotonicity for CLOCK_REALTIME, and that it is providing the PTP time. He says he thinks this is the trend and that all Linux systems will eventually do this. This reinforces that this PR is really the right approach for the long term.

@lf-lang lf-lang deleted a comment from edwardalee Feb 11, 2024
@lf-lang lf-lang deleted a comment from edwardalee Feb 11, 2024
@lf-lang lf-lang deleted a comment from edwardalee Feb 11, 2024
@erlingrj
Copy link
Collaborator Author

I talked to Kevin Stanton, who recently retired from Intel after leading for many years their PTP efforts. He told me that some Intel platforms are already enforcing monotonicity for CLOCK_REALTIME, and that it is providing the PTP time. He says he thinks this is the trend and that all Linux systems will eventually do this. This reinforces that this PR is really the right approach for the long term.

Great to hear! Would be quite interesting to hear more from him on how we can make the LF runtime as relevant as possible for the applications they were targeting at his group at Intel. Also what hardware we should target for our future PTP/TSN testbeds.

@erlingrj erlingrj disabled auto-merge February 11, 2024 15:31
@erlingrj erlingrj merged commit 26eb346 into main Feb 11, 2024
28 checks passed
@erlingrj erlingrj deleted the clock-realtime branch February 11, 2024 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants