Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix in C decentralized execution with small after delays #280

Merged
merged 6 commits into from
Nov 3, 2023

Conversation

petervdonovan
Copy link
Contributor

@petervdonovan petervdonovan commented Sep 28, 2023

The bug is only apparent when there are connections with small delays of at least one microstep.

I claim responsibility for having recently introduced this bug.

Thanks to @jackykwok2024 for helping to uncover this bug.

@edwardalee
Copy link
Contributor

A question about this fix: With decentralized coordination, a federate should be able to continue to execute even if it gets completely disconnected from the network (no messages come in). It looks to me like this use of max_level_allowed_to_advance will prevent this? Also, if a connection between federates has a large after delay (and messages take a long time to arrive), won't this block the receiving federate until that (future) message arrives?

@petervdonovan
Copy link
Contributor Author

It looks to me like this use of max_level_allowed_to_advance will prevent this?

No, the MLAA will be updated when the STP offset expires. See this procedure, which was mostly contributed by Anirudh some months ago.

Also, if a connection between federates has a large after delay (and messages take a long time to arrive), won't this block the receiving federate until that (future) message arrives?

I'm not sure what you mean. If there is a large after delay, the receiving federate is still able to keep its logical time arbitrarily close to the physical time minus the STP offset, even if it has not received a message over the connection yet.

When I finish working on this PR, I'll update it with details about the nature of the bug that it fixes and I'll mark the PR ready for review.

@petervdonovan petervdonovan force-pushed the decentralized-small-delay-bugfix branch from 6d04e2b to 5d9f57a Compare September 30, 2023 00:36
@petervdonovan petervdonovan marked this pull request as ready for review September 30, 2023 06:51
@lhstrh
Copy link
Member

lhstrh commented Oct 2, 2023

Looks like merging #279 resulted in a conflict on this PR...

@lhstrh lhstrh requested a review from edwardalee October 2, 2023 19:23
The bug is only apparent when there are connections with small delays of
at least one microstep.
@petervdonovan petervdonovan force-pushed the decentralized-small-delay-bugfix branch from 55966ed to 6956651 Compare October 6, 2023 21:00
@petervdonovan petervdonovan marked this pull request as draft October 7, 2023 04:20
@petervdonovan
Copy link
Contributor Author

It turns out that what we had in main already seems to match the behavior that @edwardalee described in our meeting today; so, the difference in behavior between this branch and the spec is apparently just due to my erroneous changes in this branch.

The problems in main that are actually fixed by this PR are now:

  • In main, we wait for the STA offset to elapse even before we start time 0, which is unnecessary; this is fixed in this PR.
  • In main, we do not wait for the STAA offsets to elapse for nonzero-delay connections; this is my mistake and is fixed in this PR.

@edwardalee
Copy link
Contributor

As for non-zero delay connections, I believe that the STAA wait time should be adjusted down by the amount of logical delay. If the amount of delay is 0, of course, this makes no difference, so in this case, it will be treated the same as if there were no delay on the connection.

@petervdonovan petervdonovan marked this pull request as ready for review October 7, 2023 20:54
@lhstrh lhstrh enabled auto-merge November 3, 2023 05:15
@lhstrh lhstrh changed the title Fix bug in C decentralized execution with small after delays Bug fix in C decentralized execution with small after delays Nov 3, 2023
@lhstrh lhstrh added the bugfix label Nov 3, 2023
@lhstrh lhstrh changed the title Bug fix in C decentralized execution with small after delays Bugfix in C decentralized execution with small after delays Nov 3, 2023
@lhstrh
Copy link
Member

lhstrh commented Nov 3, 2023

@edwardalee, should we merge this?

Copy link
Contributor

@edwardalee edwardalee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but I've raised some questions.

core/federated/federate.c Show resolved Hide resolved
core/federated/federate.c Show resolved Hide resolved
Copy link
Contributor

@edwardalee edwardalee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can go ahead and merge this. Thanks!!

@lhstrh lhstrh merged commit a6dfded into main Nov 3, 2023
22 of 25 checks passed
@petervdonovan petervdonovan deleted the decentralized-small-delay-bugfix branch November 3, 2023 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants