Adaptive scheduler #85

petervdonovan · 2022-05-31T16:05:02Z

To clarify -- this is the scheduler that previously had the vague name "heuristic."

For details, please see PR #1207 in the main repository.

A timer event seems to be occasionally dropped in TimeLimitThreaded.

TODO: compute_number_of_workers should be reduced correspondingly?

Marked improvement observed locally for RadixSort, Counting.

This optimization is lifted directly from the NP scheduler in response to visible differences in the number of cycles spent accessing mutexes. Clear improvement is observed locally for the Counting benchmark.

This is another step in the direction of the NP scheduler. The condition for level advancement should be that _everyone_ is sleeping; otherwise, we cannot optimize out the lock for level advancement without introducing a race condition.

Some of our benchmarks need either worker affinity OR use of just one worker, so that a particular reaction is always executed by the same worker. Perhaps worker affinity is easier to add. However, it must be combined with some sort of load-balancing strategy.

This is a very crude implementation that should be improved upon.

*Although this fixes a race condition, I have not tested if this fixes "the" race condition. The bug manifested itself so rarely that it was infeasible to reproduce.

Symptom: A thread goes to the event queue to wait for the next event and ends up waiting forever (while the other threads wait for work). This is possible only in SleepingBarber due to use of physical actions -- not in other benchmarks. This bug is not caught by our tests.

The level counter is part of the predicate associated with the condition variable. It is unsafe to change without acquiring a mutex.

If I am right, expected trials required to go from an intermediate number of workers to an extreme (1 or the maximum) number of workers was previously quadtratic in # of workers (cuz random walk) -- no good. This should make it (log(n))^2.

This kind of bug reflects lack of care on my part in finding a clean implementation. Some cleanup will be necessary if this is ever to be merged.

I did not observe a race condition, but in principle I think there was one. For this reason, this change could not be justified by empirical results.

Again, this is another optimization borrowed from the NP scheduler. The difference is that the NP scheduler is designed such that this optimization needs no explicit code to handle it; there, it just works.

By "unrelated" I mean, "unrelated to the hard-to-reproduce race condition that is manifesting itself in LoopDistributedCentralized in CI."

Soroosh129

Looks great!

petervdonovan force-pushed the scaling-wrt-threads-runtime-experiments branch 2 times, most recently from d32053a to 56bc3fb Compare May 31, 2022 16:24

petervdonovan added 28 commits May 31, 2022 11:26

[scheduler] 25/28 tests passing for NP2.

a78e735

[scheduler] Tests passing for NP2.

505c5eb

[scheduler] First possible performance improvement.

086e7d0

A timer event seems to be occasionally dropped in TimeLimitThreaded.

[scheduler] Include assert.h.

9a40c6e

[scheduler] SortedLinkList: Make C++ compiler happy.

5885ec8

[scheduler] Make compiler happy.

746af9a

[scheduler] Reduce expensive calls to get_physical_time.

a65531d

TODO: compute_number_of_workers should be reduced correspondingly?

[scheduler] Do not use more workers than there are reactions.

022b874

[scheduler] Try again to keep C++ and C compilers both happy.

ece6875

[scheduler] Small adjustments.

ae97fbb

[scheduler] Try to dynamically optimize num workers.

c34ef01

[scheduler] First possible improvement due to runtime analysis.

10736a0

[scheduler] Dirty patch on the heuristic.

bb37043

[scheduler] Advance level as far as necessary at once.

1466416

Marked improvement observed locally for RadixSort, Counting.

[scheduler] If everyone else is sleeping, do not acquire the mutex.

09b18cc

This optimization is lifted directly from the NP scheduler in response to visible differences in the number of cycles spent accessing mutexes. Clear improvement is observed locally for the Counting benchmark.

[scheduler] Fix bug caused by previous commit.

34c1334

This is another step in the direction of the NP scheduler. The condition for level advancement should be that _everyone_ is sleeping; otherwise, we cannot optimize out the lock for level advancement without introducing a race condition.

[scheduler] Adjust data_collection.h.

c03077b

[scheduler] Adjust data_collection.h.

e58497d

[scheduler] Add work stealing.

4fdcc09

This is a very crude implementation that should be improved upon.

[scheduler] Fix the race condition.*

5b972f0

*Although this fixes a race condition, I have not tested if this fixes "the" race condition. The bug manifested itself so rarely that it was infeasible to reproduce.

[scheduler] Fix another race condition.

59afce7

The level counter is part of the predicate associated with the condition variable. It is unsafe to change without acquiring a mutex.

[scheduler] Adjust data_collection.h.

4830702

If I am right, expected trials required to go from an intermediate number of workers to an extreme (1 or the maximum) number of workers was previously quadtratic in # of workers (cuz random walk) -- no good. This should make it (log(n))^2.

[scheduler] Bugfix.

17120ad

This kind of bug reflects lack of care on my part in finding a clean implementation. Some cleanup will be necessary if this is ever to be merged.

[scheduler] Fix another race condition.

c6303d1

I did not observe a race condition, but in principle I think there was one. For this reason, this change could not be justified by empirical results.

[scheduler] Minor cleanup.

0625c7c

[scheduler] Check number of workers required more dynamically.

3be18ed

Again, this is another optimization borrowed from the NP scheduler. The difference is that the NP scheduler is designed such that this optimization needs no explicit code to handle it; there, it just works.

petervdonovan added 5 commits May 31, 2022 22:10

[scheduler] Rename "heuristic" -> "adaptive"

c8d697a

Update lingua-franca-ref.txt.

701cfad

[scheduler] Remove a special case.

29cf529

[scheduler] Superficial commenting/renaming.

eb202c9

[scheduler] Adjust assert.

e09ad62

petervdonovan force-pushed the scaling-wrt-threads-runtime-experiments branch 6 times, most recently from f7b48fd to e2ee83f Compare June 3, 2022 21:41

[scheduler] Account for insertion into the current level.

d682724

petervdonovan force-pushed the scaling-wrt-threads-runtime-experiments branch from e2ee83f to d682724 Compare June 3, 2022 22:55

petervdonovan added 2 commits June 3, 2022 18:47

[scheduler] Unrelated bugfix.

510cd4c

By "unrelated" I mean, "unrelated to the hard-to-reproduce race condition that is manifesting itself in LoopDistributedCentralized in CI."

[scheduler] Update assertion.

ea3011b

petervdonovan force-pushed the scaling-wrt-threads-runtime-experiments branch 2 times, most recently from 504885c to 5f3b2f0 Compare June 7, 2022 01:31

[scheduler] Fix deadlock.

c19bb3d

petervdonovan force-pushed the scaling-wrt-threads-runtime-experiments branch from 5f3b2f0 to c19bb3d Compare June 7, 2022 03:59

petervdonovan added 2 commits June 6, 2022 22:02

[scheduler] Clean up after previous commit.

f1e20d5

[scheduler] Do not directly use compiler builtins.

d23d17f

petervdonovan force-pushed the scaling-wrt-threads-runtime-experiments branch from b6cb486 to d23d17f Compare June 9, 2022 06:11

petervdonovan changed the title ~~Heuristic-based scheduler~~ Adaptive scheduler Jun 9, 2022

[scheduler] Minor cleanups.

3e1ab3e

petervdonovan marked this pull request as ready for review June 9, 2022 17:36

Soroosh129 approved these changes Jun 20, 2022

View reviewed changes

Merge branch 'main' into scaling-wrt-threads-runtime-experiments

4d300d9

petervdonovan merged commit ec93fdf into main Jun 20, 2022

petervdonovan deleted the scaling-wrt-threads-runtime-experiments branch June 20, 2022 17:08

lhstrh added the feature New feature label Jan 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adaptive scheduler #85

Adaptive scheduler #85

petervdonovan commented May 31, 2022 •

edited

Loading

Soroosh129 left a comment

Adaptive scheduler #85

Adaptive scheduler #85

Conversation

petervdonovan commented May 31, 2022 • edited Loading

Soroosh129 left a comment

Choose a reason for hiding this comment

petervdonovan commented May 31, 2022 •

edited

Loading