Make dynamic scheduler scaling more robust and configurable #2801

dipinhora · 2018-06-21T00:12:10Z

Prior to this commit, dynamic scheduler scaling was not quite as
robust as we might like. When it came to waking sleeping threads,
the logic would send multiple wake signals in the hope that one
of them would work successfully and wake any sleeping thread that
might still be asleep. It was possible that even with sending
multiple signals, they could all be missed resulting in a thread
staying suspended when it should be awake.

This commit changes the logic to add a "check variable" that is
updated by threads after they wake up. This "check variable"
(active_scheduler_count_check) is a mirror of the current
active_scheduler_count and is used to confirm that a sleeping
thread has successfully woken up. This commit also changes the
logic to ensure that when threads are being woken up, they are
only sent a single signal (to minimize unnecessary work). If that
signal is missed, then scheduler thread 0 would notice that
active_scheduler_count != active_scheduler_count_check and send
another signal until the thread wakes up and updates the check
variable. This should ensure that it is not possible to have a
hanging thread due to missed signals. Overall, these changes
should make dynamic scheduler scaling more robust, including
allowing for scheduler thread 0 to also suspend. (Scheduler
thread 0 suspension is experimental because @slfritchie ran into
an issue related to programs hanging when using network IO and
suspending scheduler thread 0 a few months ago. This commit
should in theory resolve this due to the "check variable" but I
haven't had an opportunity to test and confirm whether it
actually does so or not.)

This commit also adds in a new command line option for dynamic
configutation of the threshold after which a thread is allowed to
suspend (and also separates it from the scheduler block threshold.
This new option is called --ponysuspendthreshold. This option
is needed because the current suspend threshold (about 1 ms) is
very aggresive and @slfritchie noted that scheduler threads
suspend and don't wake up even under high network load. This
option allows for the threshold to be increased (or decreased)
on a per application run time basis. Additionally, it likely
makes sense to change the default to be something more
"appropriate and balanced" based on some sort of fancy
heuristics/tests done with different values for the threshold.

sylvanc

This looks good. Dipin, are you comfortable with merging this, in terms of how well tested it is?

@slfritchie

Prior to this commit, dynamic scheduler scaling was not quite as robust as we might like. When it came to waking sleeping threads, the logic would send multiple wake signals in the hope that one of them would work successfully and wake any sleeping thread that might still be asleep. It was possible that even with sending multiple signals, they could all be missed resulting in a thread staying suspended when it should be awake. This commit changes the logic to add a "check variable" that is updated by threads after they wake up. This "check variable" (`active_scheduler_count_check`) is a mirror of the current `active_scheduler_count` and is used to confirm that a sleeping thread has successfully woken up. This commit also changes the logic to ensure that when threads are being woken up, they are only sent a single signal (to minimize unnecessary work). If that signal is missed, then scheduler thread 0 would notice that `active_scheduler_count != active_scheduler_count_check` and send another signal until the thread wakes up and updates the check variable. This should ensure that it is not possible to have a hanging thread due to missed signals. Overall, these changes should make dynamic scheduler scaling more robust, including allowing for scheduler thread 0 to also suspend. (Scheduler thread 0 suspension is experimental because @slfritchie ran into an issue related to programs hanging when using network IO and suspending scheduler thread 0 a few months ago. This commit should in theory resolve this due to the "check variable" but I haven't had an opportunity to test and confirm whether it actually does so or not.) This commit also adds in a new command line option for dynamic configutation of the threshold after which a thread is allowed to suspend (and also separates it from the scheduler block threshold. This new option is called `--ponysuspendthreshold`. This option is needed because the current suspend threshold (about 1 ms) is very aggresive and @slfritchie noted that scheduler threads suspend and don't wake up even under high network load. This option allows for the threshold to be increased (or decreased) on a per application run time basis. Additionally, it likely makes sense to change the default to be something more "appropriate and balanced" based on some sort of fancy heuristics/tests done with different values for the threshold.

dipinhora · 2018-06-27T20:50:19Z

@sylvanc I am (based on my limited testing). But then again, I tend to be overconfident to a fault. It's probably better to have someone not-me do some sort of check to ensure I didn't break something without realizing it.

Also... rebased to fix merge conflict.

dipinhora force-pushed the make_dss_better branch from 3b4c372 to 954cbe3 Compare June 21, 2018 01:06

sylvanc self-requested a review June 27, 2018 20:04

sylvanc approved these changes Jun 27, 2018

View reviewed changes

dipinhora force-pushed the make_dss_better branch from 954cbe3 to a8f67d6 Compare June 27, 2018 20:41

SeanTAllen added the changelog - added Automatically add "Added" CHANGELOG entry on merge label Jul 11, 2018

SeanTAllen merged commit a44cfaf into ponylang:master Jul 11, 2018

ponylang-main added a commit that referenced this pull request Jul 11, 2018

Update CHANGELOG for PR #2801 [skip ci]

794dca6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make dynamic scheduler scaling more robust and configurable #2801

Make dynamic scheduler scaling more robust and configurable #2801

dipinhora commented Jun 21, 2018

sylvanc left a comment

dipinhora commented Jun 27, 2018

Make dynamic scheduler scaling more robust and configurable #2801

Make dynamic scheduler scaling more robust and configurable #2801

Conversation

dipinhora commented Jun 21, 2018

sylvanc left a comment

Choose a reason for hiding this comment

dipinhora commented Jun 27, 2018