410 Dependent Epochs #423

lifflander · 2019-08-16T00:50:36Z

This is the full implementation of dependent epochs. TD dump prints on hang could be improved.

Added a bit to envelope for marking system messages
Implemented dependent epochs for:
- General active message handlers;
- Virtual Collection Elements (with whole collection release)
  - Release, check release, add action on release occurs on indexed proxy
- ObjGroups
Integration/Unit Tests for Dependent Epochs
- General active message handler test
- added ObjGroup test
- added Virtual Collection test

PhilMiller · 2019-08-16T03:52:31Z

Is there documentation somewhere on the intended semantics this is meant to provide?

For ease of review and timeliness of integration, could you post pull requests for the pieces of this that are obvious improvements, rather than complicated new feature? E.g. the IntegralSet bits and the parent/child handling cleanup, ideally each separately?

lifflander · 2019-08-19T17:13:10Z

@PhilMiller I could try to pull them apart. In this case, it may be more effort than the value it provides. Dependent epochs were the driving force behind the improvements because of how often lookups occur with them. Much of the development is interleaved.

I will work on some documentation now for this.

lifflander · 2019-08-19T18:58:54Z

I will write better documentation in the code in the future. Here's a synopsis of the new feature:

Regular, non-dependent epochs always deliver messages with an given (rooted or collective) epoch in the envelope immediately. But, the user has the ability to hold back actions until termination of an epoch and thus sequence events after an epoch terminates.

Epoch creation (collective or rooted) is an asynchronous operation by design. Thus, a collective creation of an epoch can occur on one node much earlier than another and messages with that new collective epoch can be delivered immediately---before another node even participates in the new collective epoch construction! This behavior is a feature rather than a bug---it makes the whole execution more asynchronous.

Dependent epochs allow the user to actually hold back delivery anywhere in the system when a message arrives. This allows nodes to independently make progress and send messages without considering the current state of another node (whether the node is "ready", or in the right state, to accept them).

Dependent epochs are categorized statically in the system by setting a high bit in the EpochType bit field. Thus, all nodes and other entities know if an epoch is dependent without any coordination or communication---just by checking that bit.

By default, dependent epochs are not "released" on a node, objgroup, or virtual collection element. Thus, they will not be delivered. The system will automatically buffer them at the destination until they are released.

Creating a dependent epoch is analogous to creating a non-dependent epoch:

auto epoch = vt::theTerm()->makeEpochCollectiveDep();

if (node == 0) {
      auto msg = vt::makeSharedMessage<TestMsg>();
      vt::theMsg()->sendtMsg<TestMsg, myHandler>(1, msg);
}

vt::theTerm()->finishedEpoch(epoch);

In this case, myHandler will not be triggered on node 1 until a release occurs:

if (node == 1) {
    vt::theTerm()->releaseEpoch(epoch);
}

You can also enqueue an action, or check if it's been released:

if (node == 1) {
    vt::theTerm()->onReleaseEpoch(epoch, []{});
    bool test = vt::theTerm()->epochReleased(epoch);
}

This same logic applies to objgroups and virtual collections:

auto proxy = vt::theCollection()->constructCollective(range);

// Release/check a specific element, may be remote
proxy[3].release(epoch);
proxy[3].whenReleased(epoch, []{});
bool test = proxy[3].isReleased(epoch);

// Release the epoch across the whole collection
proxy.release(epoch);

lifflander · 2019-08-19T20:03:07Z

I've split out the integral sets only: #424
I've split out the epoch parent/child relation: #425

lifflander · 2019-12-18T19:41:48Z

@PhilMiller @pnstickne @nlslatt @uhetmaniuk I just rebased this and fixed a bunch of problems as the code has changed a lot. I would like to merge this soon into develop so we can move on from it.

lifflander requested review from PhilMiller and nlslatt August 16, 2019 00:50

lifflander modified the milestone: 1.0.0-beta Oct 10, 2019

lifflander added 21 commits December 17, 2019 22:07

#410: epoch: change unused InsertEpoch to DependentEpoch

b4e38a6

#410: epoch: add function to bit-combine epoch category bits

8bf9cd5

#410: env: add new system-type header bit to standard envelope

9328add

#410: epoch: add direct category test functions

64d2736

#410: term: use new direct epoch category tests

a592b1a

#410: term: implement dependent epochs

eb17697

#410: test: add release dependent epoch test

f91b3a3

#410: term: broaden window archetype lookup criteria

63880ff

#410: active: re-route obj message w/dependent epochs

bf31a63

#410: term: add test for local term

a9d575e

#410: term: implement epoch category map

7e1af03

#410: term: implement epoch release set

db75d97

#410: objgroup: implement dependent epoch release sets

e5029ff

#410: term: fix category map insert

a1c7dd6

#410: term: add released prints

275a41c

#410: term: add prints, and fix iterator hint

bed1655

#410: active: fix bad indent

c8ff95e

#410: active: fix incorrect from node causing hang

37a8385

#410: vrt coll: start writing collection release

1af6818

#410: objgroup: add missing sytem msg header

22430a8

#410: epoch: implement dependent epochs for vrt coll

6128370

lifflander added 22 commits December 17, 2019 22:19

#410: location: set system type on loc meta msgs

5ab5612

#410: group: set system on meta msgs

e864b24

#410: epoch: set system on release handler msg

59c2c7e

#410: vrt coll: check is_sys in dep epoch buffer code

b110283

#410: example: add dep epoch example for objgroup and vrt coll

adf8a45

#410: epoch: add broadcast of release to vrt coll

2dd5b2b

#410: epoch: use plain function for broadcast due to base type

570ad4a

#410: epoch: do not fail on multiple release

e37cb4c

#410: epoch: only erase if it exists

4ac8393

#410: example: small improvement

07f4d48

#410: example: add alternative release as comment

123c1c2

#410: epoch: fix type in release msg

412662b

#410: test: implement objgroup test for dependent epochs

8398717

#410: vrt coll: add system type of vrt coll msg

b9d65a2

#410: vrt coll: fix missing system type markings

f58fe7e

#410: test: add new test for dep epochs and collections

9eaa964

#410: test: remove too verbose debug prints

3476e50

#410: term: fix merge problems with new TD code

12b2a57

#410: test: fix a bug in the AM dep epoch test

f161fee

#628: pending send: fix bug, need produce/consume w/nextStep

7e1c133

#410: test: fix more bugs in the tests

7bab11f

#410: pending send: more improvements for serialized msgs

eea3771

lifflander force-pushed the 410-dependent-epochs branch from 56a6638 to eea3771 Compare December 18, 2019 19:40

lifflander requested review from pnstickne and uhetmaniuk December 18, 2019 19:41

#410: clean up condition and add explicit

7cf893c

cz4rs linked an issue Sep 6, 2021 that may be closed by this pull request

Semantics of collection chain set wrt nextStepCollective #410

Open

lifflander closed this Sep 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

410 Dependent Epochs #423

410 Dependent Epochs #423

lifflander commented Aug 16, 2019 •

edited

Loading

PhilMiller commented Aug 16, 2019

lifflander commented Aug 19, 2019

lifflander commented Aug 19, 2019

lifflander commented Aug 19, 2019 •

edited

Loading

lifflander commented Dec 18, 2019

410 Dependent Epochs #423

410 Dependent Epochs #423

Conversation

lifflander commented Aug 16, 2019 • edited Loading

PhilMiller commented Aug 16, 2019

lifflander commented Aug 19, 2019

lifflander commented Aug 19, 2019

lifflander commented Aug 19, 2019 • edited Loading

lifflander commented Dec 18, 2019

lifflander commented Aug 16, 2019 •

edited

Loading

lifflander commented Aug 19, 2019 •

edited

Loading