Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#476 allow migrations to same node #1729

Merged
merged 13 commits into from
May 26, 2022
59 changes: 32 additions & 27 deletions docs/md/lb-manager.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,12 @@ To enable load balancing, the cmake flag \code{.cmake} -Dvt_lb_enabled=1
instrumentation of work and communication performed by collection elements.

To run a load balancer at runtime:
- Pass `--vt_lb --vt_lb_name=<LB>` as a command line argument
- Write a LB specification file `--vt_lb --vt_lb_file_name=<FILE>`

- Pass `--vt_lb --vt_lb_name=<LB>` as a command line argument
- Write a LB specification file `--vt_lb --vt_lb_file_name=<FILE>`
nlslatt marked this conversation as resolved.
Show resolved Hide resolved
- One can also pass `--vt_lb_self_migration` as a command line argument to allow load balancer to migrate objects to the same node

Note that one should use either `--vt_lb_name` or `--vt_lb_file_name` option, not both.

\section lb-specification-file LB Specification File

Expand Down Expand Up @@ -51,15 +55,16 @@ To print LB specification during startup, use `--vt_lb_show_spec` command line f

\section load-balancers Load balancers

| Load Balancer | Type | Description | Reference |
| -------------- | ----------------------- | ---------------------------------------------- | --------- |
| RotateLB | Testing | Rotate objects in a ring | `vt::vrt::collection::lb::RotateLB` |
| RandomLB | Testing | Randomly migrate object with seed | `vt::vrt::collection::lb::RandomLB` |
| GreedyLB | Centralized | Gather to central node apply min/max heap | `vt::vrt::collection::lb::GreedyLB` |
| TemperedLB | Distributed | Inspired by epidemic algorithms | `vt::vrt::collection::lb::TemperedLB` |
| HierarchicalLB | Hierarchical | Build tree to move objects nodes | `vt::vrt::collection::lb::HierarchicalLB` |
| ZoltanLB | Hyper-graph Partitioner | Run Zoltan in hyper-graph mode to LB | `vt::vrt::collection::lb::ZoltanLB` |
| OfflineLB | User-specified | Read file to determine mapping | `vt::vrt::collection::lb::OfflineLB` |
| Load Balancer | Type | Description | Reference |
| ------------------- | ----------------------- | ----------------------------------------------------------------------------------- | ---------------------------------------------- |
| RotateLB | Testing | Rotate objects in a ring | `vt::vrt::collection::lb::RotateLB` |
| RandomLB | Testing | Randomly migrate object with seed | `vt::vrt::collection::lb::RandomLB` |
| GreedyLB | Centralized | Gather to central node apply min/max heap | `vt::vrt::collection::lb::GreedyLB` |
| TemperedLB | Distributed | Inspired by epidemic algorithms | `vt::vrt::collection::lb::TemperedLB` |
| HierarchicalLB | Hierarchical | Build tree to move objects nodes | `vt::vrt::collection::lb::HierarchicalLB` |
| ZoltanLB | Hyper-graph Partitioner | Run Zoltan in hyper-graph mode to LB | `vt::vrt::collection::lb::ZoltanLB` |
| OfflineLB | User-specified | Read file to determine mapping | `vt::vrt::collection::lb::OfflineLB` |
| TestSerializationLB | Testing | Migrate objects to the same node, for testing serialization/deserialization purpose | `vt::vrt::collection::lb::TestSerializationLB` |

\section load-models Object Load Models

Expand Down Expand Up @@ -104,22 +109,22 @@ times those objects will take in all future phases.

The full set of load model classes provided with \vt is as follows

| Load Model | Description | Reference |
| ------------------ | --------------------------------------------------- | --------- |
| **Utilities** | | |
| LoadModel | Pure virtual interface class, which the following implement | `vt::vrt::collection::balance::LoadModel` |
| ComposedModel | A convenience class for most implementations to inherit from, that passes unmodified calls through to an underlying model instance | `vt::vrt::collection::balance::ComposedModel` |
| RawData | Returns historical data only, from the measured times | `vt::vrt::collection::balance::RawData` |
| **Transformers** | Transforms the values computed by the composed model(s), agnostic to whether a query refers to a past or future phase | |
| Norm | When asked for a `WHOLE_PHASE` value, computes a specified l-norm over all subphases | `vt::vrt::collection::balance::Norm` |
| SelectSubphases | Filters and remaps the subphases with data present in the underlying model | `vt::vrt::collection::balance::SelectSubphases` |
| CommOverhead | Adds a specified amount of imputed 'system overhead' time to each object's work based on the number of messages received | `vt::vrt::collection::balance::CommOverhead` |
| PerCollection | Maintains a set of load models associated with different collection instances, and passes queries for an object through to the model corresponding to its collection | `vt::vrt::collection::balance::PerCollection` |
| **Predictors** | Computes values for future phase queries, and passes through past phase queries | |
| NaivePersistence | Passes through historical queries, and maps all future queries to the most recent past phase | `vt::vrt::collection::balance::NaivePersistence` |
| PersistenceMedianLastN | Similar to NaivePersistence, except that it predicts based on a median over the N most recent phases | `vt::vrt::collection::balance::PersistenceMedianLastN` |
| LinearModel | Computes a linear regression over on object's loads from a number of recent phases | `vt::vrt::collection::balance::LinearModel` |
| MultiplePhases | Computes values for future phases based on sums of the underlying model's predictions for N corresponding future phases | `vt::vrt::collection::balance::MultiplePhases` |
| Load Model | Description | Reference |
| ---------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ |
| **Utilities** | | |
| LoadModel | Pure virtual interface class, which the following implement | `vt::vrt::collection::balance::LoadModel` |
| ComposedModel | A convenience class for most implementations to inherit from, that passes unmodified calls through to an underlying model instance | `vt::vrt::collection::balance::ComposedModel` |
| RawData | Returns historical data only, from the measured times | `vt::vrt::collection::balance::RawData` |
| **Transformers** | Transforms the values computed by the composed model(s), agnostic to whether a query refers to a past or future phase | |
| Norm | When asked for a `WHOLE_PHASE` value, computes a specified l-norm over all subphases | `vt::vrt::collection::balance::Norm` |
| SelectSubphases | Filters and remaps the subphases with data present in the underlying model | `vt::vrt::collection::balance::SelectSubphases` |
| CommOverhead | Adds a specified amount of imputed 'system overhead' time to each object's work based on the number of messages received | `vt::vrt::collection::balance::CommOverhead` |
| PerCollection | Maintains a set of load models associated with different collection instances, and passes queries for an object through to the model corresponding to its collection | `vt::vrt::collection::balance::PerCollection` |
| **Predictors** | Computes values for future phase queries, and passes through past phase queries | |
| NaivePersistence | Passes through historical queries, and maps all future queries to the most recent past phase | `vt::vrt::collection::balance::NaivePersistence` |
| PersistenceMedianLastN | Similar to NaivePersistence, except that it predicts based on a median over the N most recent phases | `vt::vrt::collection::balance::PersistenceMedianLastN` |
| LinearModel | Computes a linear regression over on object's loads from a number of recent phases | `vt::vrt::collection::balance::LinearModel` |
| MultiplePhases | Computes values for future phases based on sums of the underlying model's predictions for N corresponding future phases | `vt::vrt::collection::balance::MultiplePhases` |

All of the provided load balancers described in the previous section
require that the installed load model provide responses to future
Expand Down
1 change: 1 addition & 0 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ set(
vrt/collection/balance/offlinelb
vrt/collection/balance/zoltanlb
vrt/collection/balance/randomlb
vrt/collection/balance/testserializationlb
vrt/collection/balance/lb_invoke
vrt/collection/balance/model
vrt/collection/balance/proxy
Expand Down
1 change: 1 addition & 0 deletions src/vt/collective/collective_ops.cc
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,7 @@ void printOverwrittens(
printIfOverwritten(vt_lb_data_file);
printIfOverwritten(vt_lb_data_dir_in);
printIfOverwritten(vt_lb_data_file_in);
printIfOverwritten(vt_lb_self_migration);
printIfOverwritten(vt_help_lb_args);
printIfOverwritten(vt_no_detect_hang);
printIfOverwritten(vt_print_no_progress);
Expand Down
24 changes: 13 additions & 11 deletions src/vt/configs/arguments/app_config.h
Original file line number Diff line number Diff line change
Expand Up @@ -135,21 +135,22 @@ struct AppConfig {
bool vt_trace_event_polling = false;
bool vt_trace_irecv_polling = false;

bool vt_lb = false;
bool vt_lb_show_spec = false;
bool vt_lb_quiet = false;
std::string vt_lb_file_name = "";
std::string vt_lb_name = "NoLB";
std::string vt_lb_args = "";
int32_t vt_lb_interval = 1;
bool vt_lb_keep_last_elm = false;
bool vt_lb_data = false;
bool vt_lb_data_compress = true;
bool vt_lb = false;
bool vt_lb_show_spec = false;
bool vt_lb_quiet = false;
std::string vt_lb_file_name = "";
std::string vt_lb_name = "NoLB";
std::string vt_lb_args = "";
int32_t vt_lb_interval = 1;
bool vt_lb_keep_last_elm = false;
bool vt_lb_data = false;
bool vt_lb_data_compress = true;
std::string vt_lb_data_dir = "vt_lb_data";
std::string vt_lb_data_file = "data.%p.json";
std::string vt_lb_data_dir_in = "vt_lb_data_in";
std::string vt_lb_data_file_in = "data.%p.json";
bool vt_help_lb_args = false;
bool vt_help_lb_args = false;
bool vt_lb_self_migration = false;

bool vt_no_detect_hang = false;
bool vt_print_no_progress = true;
Expand Down Expand Up @@ -314,6 +315,7 @@ struct AppConfig {
| vt_lb_data_dir_in
| vt_lb_data_file_in
| vt_help_lb_args
| vt_lb_self_migration

| vt_no_detect_hang
| vt_print_no_progress
Expand Down
32 changes: 17 additions & 15 deletions src/vt/configs/arguments/args.cc
Original file line number Diff line number Diff line change
Expand Up @@ -475,27 +475,28 @@ void addLbArgs(CLI::App& app, AppConfig& appConfig) {
auto lb_data_file = "Load balancing data output file name";
auto lb_data_dir_in = "Load balancing data input directory";
auto lb_data_file_in = "Load balancing data input file name";
auto lb_self_migration = "Allow load balancer to migrate objects to the same node";
auto lbn = "NoLB";
auto lbi = 1;
auto lbf = "";
auto lbd = "vt_lb_data";
auto lbs = "data";
auto lba = "";
auto s = app.add_flag("--vt_lb", appConfig.vt_lb, lb);
auto t1 = app.add_flag("--vt_lb_quiet", appConfig.vt_lb_quiet, lb_quiet);
auto u = app.add_option("--vt_lb_file_name", appConfig.vt_lb_file_name, lb_file_name, lbf)
->check(CLI::ExistingFile);
auto u1 = app.add_flag("--vt_lb_show_spec", appConfig.vt_lb_show_spec, lb_show_spec);
auto v = app.add_option("--vt_lb_name", appConfig.vt_lb_name, lb_name, lbn);
auto v1 = app.add_option("--vt_lb_args", appConfig.vt_lb_args, lb_args, lba);
auto w = app.add_option("--vt_lb_interval", appConfig.vt_lb_interval, lb_interval, lbi);
auto wl = app.add_flag("--vt_lb_keep_last_elm", appConfig.vt_lb_keep_last_elm, lb_keep_last_elm);
auto ww = app.add_flag("--vt_lb_data", appConfig.vt_lb_data, lb_data);
auto xz = app.add_flag("--vt_lb_data_compress", appConfig.vt_lb_data_compress, lb_data_comp);
auto wx = app.add_option("--vt_lb_data_dir", appConfig.vt_lb_data_dir, lb_data_dir, lbd);
auto wy = app.add_option("--vt_lb_data_file", appConfig.vt_lb_data_file, lb_data_file,lbs);
auto xx = app.add_option("--vt_lb_data_dir_in", appConfig.vt_lb_data_dir_in, lb_data_dir_in, lbd);
auto xy = app.add_option("--vt_lb_data_file_in", appConfig.vt_lb_data_file_in, lb_data_file_in,lbs);
auto s = app.add_flag("--vt_lb", appConfig.vt_lb, lb);
auto t1 = app.add_flag("--vt_lb_quiet", appConfig.vt_lb_quiet, lb_quiet);
auto u = app.add_option("--vt_lb_file_name", appConfig.vt_lb_file_name, lb_file_name, lbf)->check(CLI::ExistingFile);
auto u1 = app.add_flag("--vt_lb_show_spec", appConfig.vt_lb_show_spec, lb_show_spec);
auto v = app.add_option("--vt_lb_name", appConfig.vt_lb_name, lb_name, lbn);
auto v1 = app.add_option("--vt_lb_args", appConfig.vt_lb_args, lb_args, lba);
auto w = app.add_option("--vt_lb_interval", appConfig.vt_lb_interval, lb_interval, lbi);
auto wl = app.add_flag("--vt_lb_keep_last_elm", appConfig.vt_lb_keep_last_elm, lb_keep_last_elm);
auto ww = app.add_flag("--vt_lb_data", appConfig.vt_lb_data, lb_data);
auto xz = app.add_flag("--vt_lb_data_compress", appConfig.vt_lb_data_compress, lb_data_comp);
auto wx = app.add_option("--vt_lb_data_dir", appConfig.vt_lb_data_dir, lb_data_dir, lbd);
auto wy = app.add_option("--vt_lb_data_file", appConfig.vt_lb_data_file, lb_data_file,lbs);
auto xx = app.add_option("--vt_lb_data_dir_in", appConfig.vt_lb_data_dir_in, lb_data_dir_in, lbd);
auto xy = app.add_option("--vt_lb_data_file_in", appConfig.vt_lb_data_file_in, lb_data_file_in, lbs);
auto lbasm = app.add_flag("--vt_lb_self_migration", appConfig.vt_lb_self_migration, lb_self_migration);

auto debugLB = "Load Balancing";
s->group(debugLB);
Expand All @@ -512,6 +513,7 @@ void addLbArgs(CLI::App& app, AppConfig& appConfig) {
xx->group(debugLB);
xy->group(debugLB);
xz->group(debugLB);
lbasm->group(debugLB);

// help options deliberately omitted from the debugLB group above so that
// they appear grouped with --vt_help when --vt_help is used
Expand Down
12 changes: 7 additions & 5 deletions src/vt/messaging/active.cc
Original file line number Diff line number Diff line change
Expand Up @@ -217,10 +217,10 @@ EventType ActiveMessenger::sendMsgBytesWithPut(
TagType const& send_tag
) {
auto msg = base.get();
auto const& is_term = envelopeIsTerm(msg->env);
auto const& is_put = envelopeIsPut(msg->env);
auto const& is_put_packed = envelopeIsPackedPutType(msg->env);
auto const& is_bcast = envelopeIsBcast(msg->env);
auto const is_term = envelopeIsTerm(msg->env);
auto const is_put = envelopeIsPut(msg->env);
auto const is_put_packed = envelopeIsPackedPutType(msg->env);
auto const is_bcast = envelopeIsBcast(msg->env);

if (!is_term || vt_check_enabled(print_term_msgs)) {
vt_debug_print(
Expand All @@ -231,7 +231,9 @@ EventType ActiveMessenger::sendMsgBytesWithPut(
}

vtWarnIf(
!(dest != theContext()->getNode() || is_bcast),
dest == theContext()->getNode() &&
not is_bcast &&
not theConfig()->vt_lb_self_migration,
fmt::format("Destination {} should != this node", dest)
);

Expand Down
10 changes: 8 additions & 2 deletions src/vt/runtime/runtime_banner.cc
Original file line number Diff line number Diff line change
Expand Up @@ -290,9 +290,15 @@ void Runtime::printStartupBanner() {
if (getAppConfig()->vt_lb) {
auto f9 = opt_on("--vt_lb", "Load balancing enabled");
fmt::print("{}\t{}{}", vt_pre, f9, reset);

auto f10 = opt_on("--vt_lb_self_migration", "Self migration enabled");
fmt::print("{}\t{}{}", vt_pre, f10, reset);

if (getAppConfig()->vt_lb_file_name != "") {
auto f12 = fmt::format("Reading LB specification from file \"{}\"",
getAppConfig()->vt_lb_file_name);
auto f12 = fmt::format(
"Reading LB specification from file \"{}\"",
getAppConfig()->vt_lb_file_name
);
auto f11 = opt_on("--vt_lb_file_name", f12);
fmt::print("{}\t{}{}", vt_pre, f11, reset);

Expand Down
4 changes: 2 additions & 2 deletions src/vt/serialization/messaging/serialized_messenger.impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -405,9 +405,9 @@ template <typename MsgT, typename BaseT>
"serialMsgHandler: local msg: handler={}\n", typed_handler
);

auto base_msg = msg.template to<BaseMsgType>();
auto base_msg = user_msg.template to<BaseMsgType>();
return messaging::PendingSend(base_msg, [=](MsgPtr<BaseMsgType> in) {
runnable::makeRunnable(msg, true, typed_handler, node)
runnable::makeRunnable(user_msg, true, typed_handler, node)
.withTDEpochFromMsg()
.enqueue();
});
Expand Down
9 changes: 4 additions & 5 deletions src/vt/vrt/collection/balance/baselb/baselb.cc
Original file line number Diff line number Diff line change
Expand Up @@ -142,11 +142,10 @@ std::shared_ptr<const balance::Reassignment> BaseLB::normalizeReassignments() {
auto const new_node = std::get<1>(transfer);
auto const current_node = obj_id.curr_node;

// self-migration
if (current_node == new_node) {
// Filter out self-migrations entirely
continue;
// vtAbort("Not currently implemented -- self-migration");
vt_debug_print(
verbose, lb, "BaseLB::normalizeReassignments(): self migration\n"
);
}

// the object lives here, so it's departing.
Expand Down Expand Up @@ -228,7 +227,7 @@ void BaseLB::notifyNewHostNodeOfObjectsArriving(
}

void BaseLB::migrateObjectTo(ObjIDType const obj_id, NodeType const to) {
if (obj_id.curr_node != to) {
if (obj_id.curr_node != to || theConfig()->vt_lb_self_migration) {
transfers_.push_back(TransferDestType{obj_id, to});
}
}
Expand Down
Loading