-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1279 gossiplb inform and decide have bugs #1348
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I think it's fine if we merge this now. For develop, we could improve the documentation better (doxygen comments) and need to fix the prints since that API was refactored last week.
int n_transfers_ = 0; | ||
}; | ||
|
||
struct GossipRejectionStatsMsg : collective::ReduceTMsg<RejectionStats> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lifflander Would it be possible to avoid the need to define types like this, by just passing through constructor arguments to the encapsulated type T
in ReduceTMsg<T>
?
c8b5260
to
242282c
Compare
Codecov Report
@@ Coverage Diff @@
## develop #1348 +/- ##
===========================================
+ Coverage 82.50% 82.60% +0.09%
===========================================
Files 759 759
Lines 28279 28506 +227
===========================================
+ Hits 23333 23546 +213
- Misses 4946 4960 +14
|
Includes many bug fixes and improvements to GossipLB. The following LB args were added:
trials
(default3
): how many times to repeat the requested number of iterations, hoping to find a better imbalance; helps if it’s easy to get stuck in a local minimumdeterministic
(default false): for debugging purposes, make the migration decision deterministic assuming deterministic loads (for testing on deterministic loads, consider using the driver in development under Implement general test driver for LB testing #1265)inform
(default SyncInform): choice of gossiping approach0
): synchronous propagates after all recvs for a round, but has sync cost (matches LBAF approach)1
): asynchronous propagates after the first recv for a round, but avoids sync costordering
(default Marginal): order in which to evaluate local objects for migration0
): use the unordered_map iteration order1
): order by ascending element ID2
): order by descending load starting with the object of marginal load, then order ascending for larger loadscmf
(default NormByMax): the algorithm used for computing the CMF0
): remove processors from the CMF as soon as they exceed the target (e.g., processor-average) load; use a CMF factor of 1.0/x, where x is the target load1
): do not remove processors from the CMF that exceed the target load until the next iteration; use a CMF factor of 1.0x, where x is the maximum of the target load and the most loaded processor in the CMF2
): do not remove processors from the CMF that exceed the target load until the next iteration; use a CMF factor of 1.0x, where x is the load of the processor that is computing the CMFrollback
(default true): whether to roll back to an earlier iteration if it had the best imbalancetargetpole
(default false): whether to replace the processor-average load with the max of that and the maximum object load, effectively redefining overloaded/underloaded based on the longest pole load when it exceeds the processor-average loadNote that I have a both a develop-based branch (this PR;
1279-gossiplb-inform-and-decide-have-bugs
) and a release-based branch (1279-gossiplb-inform-and-decide-have-bugs-release
).Closes #1279