-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
410 Dependent Epochs #423
410 Dependent Epochs #423
Conversation
Is there documentation somewhere on the intended semantics this is meant to provide? For ease of review and timeliness of integration, could you post pull requests for the pieces of this that are obvious improvements, rather than complicated new feature? E.g. the |
@PhilMiller I could try to pull them apart. In this case, it may be more effort than the value it provides. Dependent epochs were the driving force behind the improvements because of how often lookups occur with them. Much of the development is interleaved. I will work on some documentation now for this. |
I will write better documentation in the code in the future. Here's a synopsis of the new feature: Regular, non-dependent epochs always deliver messages with an given (rooted or collective) epoch in the envelope immediately. But, the user has the ability to hold back actions until termination of an epoch and thus sequence events after an epoch terminates. Epoch creation (collective or rooted) is an asynchronous operation by design. Thus, a collective creation of an epoch can occur on one node much earlier than another and messages with that new collective epoch can be delivered immediately---before another node even participates in the new collective epoch construction! This behavior is a feature rather than a bug---it makes the whole execution more asynchronous. Dependent epochs allow the user to actually hold back delivery anywhere in the system when a message arrives. This allows nodes to independently make progress and send messages without considering the current state of another node (whether the node is "ready", or in the right state, to accept them). Dependent epochs are categorized statically in the system by setting a high bit in the By default, dependent epochs are not "released" on a node, objgroup, or virtual collection element. Thus, they will not be delivered. The system will automatically buffer them at the destination until they are released. Creating a dependent epoch is analogous to creating a non-dependent epoch: auto epoch = vt::theTerm()->makeEpochCollectiveDep();
if (node == 0) {
auto msg = vt::makeSharedMessage<TestMsg>();
vt::theMsg()->sendtMsg<TestMsg, myHandler>(1, msg);
}
vt::theTerm()->finishedEpoch(epoch); In this case, if (node == 1) {
vt::theTerm()->releaseEpoch(epoch);
} You can also enqueue an action, or check if it's been released: if (node == 1) {
vt::theTerm()->onReleaseEpoch(epoch, []{});
bool test = vt::theTerm()->epochReleased(epoch);
} This same logic applies to objgroups and virtual collections: auto proxy = vt::theCollection()->constructCollective(range);
// Release/check a specific element, may be remote
proxy[3].release(epoch);
proxy[3].whenReleased(epoch, []{});
bool test = proxy[3].isReleased(epoch);
// Release the epoch across the whole collection
proxy.release(epoch); |
56a6638
to
eea3771
Compare
@PhilMiller @pnstickne @nlslatt @uhetmaniuk I just rebased this and fixed a bunch of problems as the code has changed a lot. I would like to merge this soon into develop so we can move on from it. |
This is the full implementation of dependent epochs. TD dump prints on hang could be improved.