Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make dynamic scheduler scaling more robust and configurable #2801

Merged
merged 1 commit into from
Jul 11, 2018

Conversation

dipinhora
Copy link
Contributor

Prior to this commit, dynamic scheduler scaling was not quite as
robust as we might like. When it came to waking sleeping threads,
the logic would send multiple wake signals in the hope that one
of them would work successfully and wake any sleeping thread that
might still be asleep. It was possible that even with sending
multiple signals, they could all be missed resulting in a thread
staying suspended when it should be awake.

This commit changes the logic to add a "check variable" that is
updated by threads after they wake up. This "check variable"
(active_scheduler_count_check) is a mirror of the current
active_scheduler_count and is used to confirm that a sleeping
thread has successfully woken up. This commit also changes the
logic to ensure that when threads are being woken up, they are
only sent a single signal (to minimize unnecessary work). If that
signal is missed, then scheduler thread 0 would notice that
active_scheduler_count != active_scheduler_count_check and send
another signal until the thread wakes up and updates the check
variable. This should ensure that it is not possible to have a
hanging thread due to missed signals. Overall, these changes
should make dynamic scheduler scaling more robust, including
allowing for scheduler thread 0 to also suspend. (Scheduler
thread 0 suspension is experimental because @slfritchie ran into
an issue related to programs hanging when using network IO and
suspending scheduler thread 0 a few months ago. This commit
should in theory resolve this due to the "check variable" but I
haven't had an opportunity to test and confirm whether it
actually does so or not.)

This commit also adds in a new command line option for dynamic
configutation of the threshold after which a thread is allowed to
suspend (and also separates it from the scheduler block threshold.
This new option is called --ponysuspendthreshold. This option
is needed because the current suspend threshold (about 1 ms) is
very aggresive and @slfritchie noted that scheduler threads
suspend and don't wake up even under high network load. This
option allows for the threshold to be increased (or decreased)
on a per application run time basis. Additionally, it likely
makes sense to change the default to be something more
"appropriate and balanced" based on some sort of fancy
heuristics/tests done with different values for the threshold.

@sylvanc sylvanc self-requested a review June 27, 2018 20:04
Copy link
Contributor

@sylvanc sylvanc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Dipin, are you comfortable with merging this, in terms of how well tested it is?

Prior to this commit, dynamic scheduler scaling was not quite as
robust as we might like. When it came to waking sleeping threads,
the logic would send multiple wake signals in the hope that one
of them would work successfully and wake any sleeping thread that
might still be asleep. It was possible that even with sending
multiple signals, they could all be missed resulting in a thread
staying suspended when it should be awake.

This commit changes the logic to add a "check variable" that is
updated by threads after they wake up. This "check variable"
(`active_scheduler_count_check`) is a mirror of the current
`active_scheduler_count` and is used to confirm that a sleeping
thread has successfully woken up. This commit also changes the
logic to ensure that when threads are being woken up, they are
only sent a single signal (to minimize unnecessary work). If that
signal is missed, then scheduler thread 0 would notice that
`active_scheduler_count != active_scheduler_count_check` and send
another signal until the thread wakes up and updates the check
variable. This should ensure that it is not possible to have a
hanging thread due to missed signals. Overall, these changes
should make dynamic scheduler scaling more robust, including
allowing for scheduler thread 0 to also suspend. (Scheduler
thread 0 suspension is experimental because @slfritchie ran into
an issue related to programs hanging when using network IO and
suspending scheduler thread 0 a few months ago. This commit
should in theory resolve this due to the "check variable" but I
haven't had an opportunity to test and confirm whether it
actually does so or not.)

This commit also adds in a new command line option for dynamic
configutation of the threshold after which a thread is allowed to
suspend (and also separates it from the scheduler block threshold.
This new option is called `--ponysuspendthreshold`. This option
is needed because the current suspend threshold (about 1 ms) is
very aggresive and @slfritchie noted that scheduler threads
suspend and don't wake up even under high network load. This
option allows for the threshold to be increased (or decreased)
on a per application run time basis. Additionally, it likely
makes sense to change the default to be something more
"appropriate and balanced" based on some sort of fancy
heuristics/tests done with different values for the threshold.
@dipinhora
Copy link
Contributor Author

@sylvanc I am (based on my limited testing). But then again, I tend to be overconfident to a fault. It's probably better to have someone not-me do some sort of check to ensure I didn't break something without realizing it.

Also... rebased to fix merge conflict.

@SeanTAllen SeanTAllen added the changelog - added Automatically add "Added" CHANGELOG entry on merge label Jul 11, 2018
@SeanTAllen SeanTAllen merged commit a44cfaf into ponylang:master Jul 11, 2018
ponylang-main added a commit that referenced this pull request Jul 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog - added Automatically add "Added" CHANGELOG entry on merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants