Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

410 Dependent Epochs #423

Closed
wants to merge 46 commits into from
Closed

410 Dependent Epochs #423

wants to merge 46 commits into from

Conversation

lifflander
Copy link
Collaborator

@lifflander lifflander commented Aug 16, 2019

This is the full implementation of dependent epochs. TD dump prints on hang could be improved.

  • Added a bit to envelope for marking system messages
  • Implemented dependent epochs for:
    • General active message handlers;
    • Virtual Collection Elements (with whole collection release)
      • Release, check release, add action on release occurs on indexed proxy
    • ObjGroups
  • Integration/Unit Tests for Dependent Epochs
    • General active message handler test
    • added ObjGroup test
    • added Virtual Collection test

@PhilMiller
Copy link
Member

Is there documentation somewhere on the intended semantics this is meant to provide?

For ease of review and timeliness of integration, could you post pull requests for the pieces of this that are obvious improvements, rather than complicated new feature? E.g. the IntegralSet bits and the parent/child handling cleanup, ideally each separately?

@lifflander
Copy link
Collaborator Author

@PhilMiller I could try to pull them apart. In this case, it may be more effort than the value it provides. Dependent epochs were the driving force behind the improvements because of how often lookups occur with them. Much of the development is interleaved.

I will work on some documentation now for this.

@lifflander
Copy link
Collaborator Author

I will write better documentation in the code in the future. Here's a synopsis of the new feature:

Regular, non-dependent epochs always deliver messages with an given (rooted or collective) epoch in the envelope immediately. But, the user has the ability to hold back actions until termination of an epoch and thus sequence events after an epoch terminates.

Epoch creation (collective or rooted) is an asynchronous operation by design. Thus, a collective creation of an epoch can occur on one node much earlier than another and messages with that new collective epoch can be delivered immediately---before another node even participates in the new collective epoch construction! This behavior is a feature rather than a bug---it makes the whole execution more asynchronous.

Dependent epochs allow the user to actually hold back delivery anywhere in the system when a message arrives. This allows nodes to independently make progress and send messages without considering the current state of another node (whether the node is "ready", or in the right state, to accept them).

Dependent epochs are categorized statically in the system by setting a high bit in the EpochType bit field. Thus, all nodes and other entities know if an epoch is dependent without any coordination or communication---just by checking that bit.

By default, dependent epochs are not "released" on a node, objgroup, or virtual collection element. Thus, they will not be delivered. The system will automatically buffer them at the destination until they are released.

Creating a dependent epoch is analogous to creating a non-dependent epoch:

auto epoch = vt::theTerm()->makeEpochCollectiveDep();

if (node == 0) {
      auto msg = vt::makeSharedMessage<TestMsg>();
      vt::theMsg()->sendtMsg<TestMsg, myHandler>(1, msg);
}

vt::theTerm()->finishedEpoch(epoch);

In this case, myHandler will not be triggered on node 1 until a release occurs:

if (node == 1) {
    vt::theTerm()->releaseEpoch(epoch);
}

You can also enqueue an action, or check if it's been released:

if (node == 1) {
    vt::theTerm()->onReleaseEpoch(epoch, []{});
    bool test = vt::theTerm()->epochReleased(epoch);
}

This same logic applies to objgroups and virtual collections:

auto proxy = vt::theCollection()->constructCollective(range);

// Release/check a specific element, may be remote
proxy[3].release(epoch);
proxy[3].whenReleased(epoch, []{});
bool test = proxy[3].isReleased(epoch);

// Release the epoch across the whole collection
proxy.release(epoch);

@lifflander
Copy link
Collaborator Author

lifflander commented Aug 19, 2019

I've split out the integral sets only: #424
I've split out the epoch parent/child relation: #425

@lifflander lifflander modified the milestone: 1.0.0-beta Oct 10, 2019
@lifflander
Copy link
Collaborator Author

@PhilMiller @pnstickne @nlslatt @uhetmaniuk I just rebased this and fixed a bunch of problems as the code has changed a lot. I would like to merge this soon into develop so we can move on from it.

@cz4rs cz4rs linked an issue Sep 6, 2021 that may be closed by this pull request
@lifflander lifflander closed this Sep 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Semantics of collection chain set wrt nextStepCollective
2 participants