Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed race-condition in _lf_replace_template_token #2082

Merged
merged 5 commits into from
Nov 6, 2023

Conversation

erlingrj
Copy link
Collaborator

@erlingrj erlingrj commented Nov 3, 2023

This addresses the problem described in #2081

The problem was that we moved tokens from the trigger structs to the action structs outside a critical section. This created a race condition with external asynchronus contexts trying to schedule events onto the very same action struct.

This simple solution does the moving of tokens inside i critical section. I think another solution might be to re-think the design. Particularily.

  1. Why does lf_schedule put the token on BOTH the event queue AND the action struct? The runtime will later move the token from event queue -> trigger struct and then in the reaction preambles, it moves it from the trigger struct to the action struct.
  2. Why is the token moved from trigger-struct to action-struct in the code-generated preamble. It would be more natural to do that from wihin the runtime (which is in a critical section)

This is code-generated into reactions moving the
token from the trigger struct to the action struct.
Race condition with asynchronous scheduling
onto the same action struct.
@erlingrj
Copy link
Collaborator Author

erlingrj commented Nov 3, 2023

@edwardalee

critical section around code-gen token moving
@erlingrj
Copy link
Collaborator Author

erlingrj commented Nov 3, 2023

I had to include platform.h for the reaction bodies in order to code-generate a critical section in the reaction preamble. It suggests that maybe another solution is better. Like having the runtime move the token instead of code-generating it.

Copy link
Member

@lhstrh lhstrh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You found a needle in a haystack. Terrific job, @erlingrj.

@lhstrh lhstrh added this pull request to the merge queue Nov 3, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 3, 2023
@erlingrj
Copy link
Collaborator Author

erlingrj commented Nov 3, 2023

@lhstrh we can wait a little with merging. There might be a better way of addressing this

Copy link
Collaborator

@edwardalee edwardalee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, but I think the critical section needs to be a little bigger or the action struct could become inconsistent.

@erlingrj
Copy link
Collaborator Author

erlingrj commented Nov 4, 2023

@edwardalee I expanded the critical sections according to your comments.

@erlingrj
Copy link
Collaborator Author

erlingrj commented Nov 5, 2023

@edwardalee, have you seen this test failure of ChainWithDelay before?

 Federate 2: FATAL ERROR: Attempted to insert reactions for a trigger that had an intended tag that was in the past. This should not happen under centralized coordination. Intended tag: (2500000, 0). Current tag: (3000000, 0)).

@erlingrj
Copy link
Collaborator Author

erlingrj commented Nov 5, 2023

I was able to reproduce it here and I have the trace of federated communication. Should we discuss it in a separate issue? It might be related to fixes you are working on? The RTI erroneously gives a PTAG to a federate

@edwardalee
Copy link
Collaborator

Yes, I'm seeing these kinds of failures. There is an easy solution, but it will mask a more fundamental problem. The easy solution is to prohibit the use of PTAG except for zero-delay cycles. We will want to do that eventually, but first we should fix the handling of PTAG because with this easy fix, it will be hard to produce these errors.

@erlingrj
Copy link
Collaborator Author

erlingrj commented Nov 5, 2023

Separate issue for the ChainWithDelay PTAG problem: #2084

@erlingrj erlingrj requested a review from edwardalee November 6, 2023 08:49
Copy link
Collaborator

@edwardalee edwardalee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Let's merge this.

@edwardalee edwardalee added this pull request to the merge queue Nov 6, 2023
Merged via the queue into master with commit 090789b Nov 6, 2023
41 checks passed
@edwardalee edwardalee deleted the token-racing-cond branch November 6, 2023 17:26
@lhstrh lhstrh added the bugfix label Jan 23, 2024
@lhstrh lhstrh changed the title Fix race-condition in _lf_replace_template_token Fixed race-condition in _lf_replace_template_token Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants