Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed preprocessor directives for clock sync #425

Merged
merged 17 commits into from
May 17, 2024
Merged

Conversation

edwardalee
Copy link
Contributor

The preprocessor directives controlling clock synchronization were erroneously defaulting to having no clock synchronization at all. Only by setting the clock-sync parameter of the target to on or init could you actually get clock synchronization to work. This PR fixes this so that the default clock synchronization is to perform clock synchronization at initialization and then stop synchronizing.

This will require bumping reactor-c in lingua-franca to have any effect.

@edwardalee edwardalee requested a review from erlingrj May 12, 2024 06:25
@erlingrj
Copy link
Collaborator

Are you sure that we want clock sync to be the default? For all federated programs running on the same host, this would just introduce a slight timing error into the program. For federated programs where there already is clock sync running from NTP/PTP, this would also possibly introduce errors. If we do the LF initial sync at a time where the hosts are a little "off", then later NTP/PTP would presumably bring the clocks together, however since we did an initial clock sync, we would keep this initial bias.

I think that it instead should be opt-in.

@edwardalee
Copy link
Contributor Author

Yes, initial clock sync should be the default. NTP is not very precise, and any clock sync error will cause behavior that users won't expect, like waiting a long time before starting. When running on the same host with the RTI, clock sync is skipped.

@erlingrj
Copy link
Collaborator

I think if the system is running NTP then doing an initial clock sync will only cause problems. Imagine that the two computers for some has a 5 sec offset initially, I think that is unlikely if they are running NTP though which should give accuracy to the millisecond level. If we initially add a constant bias of 5 sec to the system clock then this will turn into a 5sec clock sync error once NTP steers in the clocks correctly. I think this will be much more confusing to a user.

I think our application-level clock sync should only be used if the system is not running any other clock sync clients. So I think it should only be default if we believe that this type of system is the most probable target for distributed LF.

@edwardalee
Copy link
Contributor Author

edwardalee commented May 12, 2024

I'm not sure about this argument. Our clock synchronization is with a clock that is much closer, typically, than what NTP is using. When it is turned on, we are seeing clock sync errors on the order of tens of microseconds, a level not achievable with NTP. You are right that if NTP is running poorly and then suddenly starts running well while our program is running that this could create problems. But is this likely?

In any case, the default policy is a one-line change in this PR. The PR is still necessary because without it we can't actually set any default policy.

Copy link
Collaborator

@erlingrj erlingrj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, not sure what the CI problem is, I cannot seem to re-run any of the tests either...

@edwardalee
Copy link
Contributor Author

edwardalee commented May 13, 2024

Looks like CI is failing because a Python federate doesn't expect a clock sync signal from the RTI. Apparently, the Python federates are not getting the memo to use "init" clock synchronization.

@erlingrj erlingrj force-pushed the clock-sync-initial branch from e5fef0c to f35c5f1 Compare May 13, 2024 17:35
@erlingrj
Copy link
Collaborator

@edwardalee I believe this should solve our issues. My changes include:

  1. lf_time_add with saturation and overflow/underflow detection
  2. Provide clock_sync_add/subtract empty implementations in clock-sync.c for the scenario with FEDERATED but CLOCK_SYNC_OFF
  3. Changes to the code-generator so that it defines LF_CLOCK_SYNC_OFF when we are on the same host as the RTI

Copy link
Contributor Author

@edwardalee edwardalee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Looks ready to me.

tag/api/tag.h Outdated Show resolved Hide resolved
@edwardalee edwardalee enabled auto-merge May 15, 2024 14:54
core/tag.c Show resolved Hide resolved
core/tag.c Show resolved Hide resolved
@edwardalee edwardalee added this pull request to the merge queue May 17, 2024
Merged via the queue into main with commit 0a42722 May 17, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants