-
-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix early quiescence/termination bug #4550
Conversation
Prior to this commit, the logic to detect quiescence had a race condition in relation to dynamic scheduler scaling and it was possible for the runtime to incorrectly detect quiescence and termninate early if a scheduler thread suspended at just the right time. This commit changes the quiescence logic to keep an accurate track of exactly how many scheduler threads are active at the time the quiescence detection protocol begins so it can ensure that any scheduler threads suspending or unsuspending can no longer cause an incorrect determination that might lead to early termination of the runtime.
not included as part of this PR, but stopping the ASIO thread should likely be moved to happen after all the scheduler threads have stopped... maybe worth doing as a follow on thing.. |
Hi @dipinhora, The changelog - fixed label was added to this pull request; all PRs with a changelog label need to have release notes included as part of the PR. If you haven't added release notes already, please do. Release notes are added by creating a uniquely named file in the The basic format of the release notes (using markdown) should be:
Thanks. |
That sounds like a good idea |
send_msg_all_active(sched->index, SCHED_CNF, sched->ack_token); | ||
// save the # of active schedulers to expect ACK's from | ||
sched->ack_count = send_msg_all_active(sched->index, SCHED_CNF, sched->ack_token); | ||
pony_assert(sched->ack_count > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this assertion failed during a CI run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep.. i missed an edge case.. will have a fix soonish...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix pushed
Prior to this commit, the logic to detect quiescence had a race condition in relation to dynamic scheduler scaling and it was possible for the runtime to incorrectly detect quiescence and termninate early if a scheduler thread suspended at just the right time.
This commit changes the quiescence logic to keep an accurate track of exactly how many scheduler threads are active at the time the quiescence detection protocol begins so it can ensure that any scheduler threads suspending or unsuspending can no longer cause an incorrect determination that might lead to early termination of the runtime.