Business Card Exchange for Process-to-Process Wire-up #191

jjhursey · 2019-06-01T16:02:17Z

Brief Description

Multi-process communication libraries, such as MPI, need to establish communication channels between a set of those processes. Each process needs to share connectivity information (a.k.a. Business Cards) with all other processes before communication channels can be established. The runtime environment must provide a mechanism for the efficient exchange of this connectivity information. Additional information about the current state of the job (e.g., number of processes globally and locally) and of how the process was started (e.g., process binding) are also helpful.

Use Case Details

Note: The Instant-On wire-up mechanism is a separate, related use case.

Multi-process communication libraries, such as MPI, need to establish communication channels between a set of those processes. Each process needs to share connectivity information (a.k.a. Business Cards) with all other processes before communication channels can be established. This connectivity information may take the form of one or more unique strings that allow a different process to establish a communication channel with the originator.

Each process provides their business card to PMIx via one or more PMIx_Put operations to store the tuple of {UID, key, value}. The UID is the unique name for this process in the PMIx universe (i.e., namespace and rank). The key is a unique key that other processes can reference generically (note that since the UID is also associated with the key there is no need to make the key uniquely named per process). The value is the string representation of the connectivity information.

Some business card information is meant for remote processes (e.g., TCP or InfiniBand addresses) while others are meant only for local processes (e.g., shared memory information). As such a scope should be associated with the PMIx_Put operation to differentiate this intention.

The PMIx_Put operations may be cached local to the process. Once all PMIx_Put operations have been called each process should call PMIx_Commit to push those values to the local PMIx server. Note that in a multi-library configuration each library may PMIx_Put then PMIx_Commit values - so there may be multiple PMIx_Commit calls before a Business Card Exchange is activated.

After calling PMIx_Commit a process can activate the Business Card Exchange collective operation by calling PMIx_Fence. The PMIx_Fence operation is collective over the set of processes specified in the argument set. That allows for the collective to span a subset of a namespace or multiple namespaces. After the completion of the PMIx_Fence operation, the data PMIx_Put by other processes is available to the local process through a call to PMIx_Get which returns the key/value pairs necessary to establish the connection(s) with the other processes.

The PMIx_Fence operation must have a "Synchronize Only" mode that works as a barrier operation. This is helpful if the communication library requires a synchronization before leaving initialization or starting finalization, for example.

The PMIx_Fence operation should have a "Sparse" mode in addition to a "Full" mode for the data exchange. The "Full" mode will fully exchange all Business Card information to all other processes. This is helpful for tightly communicating applications. The "Sparse" mode will dynamically pull the connectivity information on-demand from inside of PMIx_Get (if it is not already available locally). This is helpful for sparsely communicating applications. Since which mode is best for an application cannot be inferred by the PMIx library the caller must specify which mode works best for their application.

The PMIx_Fence operation should have an option for the end user to specify which mode they desire for this operation.

Additional information about the current state of the job (e.g., number of processes globally and locally) and of how the process was started (e.g., process binding) are also helpful. This "job level" information must be available immediately after PMIx_Init without the need for any explicit synchronization.

The number of processes globally in the namespace and this process's rank within that namespace is important to know before establishing the Business Card information to best allocate resources.

The number of processes local to the node and this process's local rank is important to know before establishing the Business Card information to help the caller determine the scope of the put operation. For example, to designate a leader to set up a shared memory segment of the proper size before putting that information into the locally scoped Business Card information.

The number of processes local to a remote node is also helpful to know before establishing the Business Card information. This information is useful to pre-establish local resources before that remote node starts to initiate a connection or to determine the number of connections that need to be advertised in the Business Card when it is sent out.

Note that some of the job level information may change over the course of the job in a dynamic application.

Interfaces

PMIx_Put
PMIx_Get
PMIx_Commit
PMIx_Fence
PMIx_Init

Keys

The following job level information is useful to have before establishing Business Card information:

PMIX_NODE_LIST List of nodes in the job
PMIX_NUM_NODES Number of nodes in the job
PMIX_NODEID Node ID where this process is located
PMIX_JOB_SIZE Number of processes globally in the job
PMIX_PROC_MAP Mapping of processes to nodes in this job
PMIX_LOCAL_PEERS List of local processes on this node in this job
PMIX_LOCAL_SIZE Number of processes local to this node.

For each process this information is also useful (note that any one process may want to access this list of information about any other process in the system):

PMIX_RANK My global rank in the job
PMIX_LOCAL_RANK My local rank on this node for this job
PMIX_GLOBAL_RANK My global rank across all namespaces
PMIX_LOCALITY_STRING Process binding on this node
PMIX_HOSTNAME hostname associated with this process (useful for queries about remote processes)

There are other keys that are helpful to have before a synchronization point, this is not meant to be a comprehensive list.

References

PMI v2 link
PMIx: Process management for exascale environments (Nov. 2018) DOI

The text was updated successfully, but these errors were encountered:

jjhursey · 2019-06-01T16:03:03Z

Note: Still a Work In Progress. There are some more details that I want to add about job level information and the behavior of put/get.

SteVwonder · 2019-06-05T02:22:34Z

Thanks @jjhursey for putting this together. As you mention, MPI has this use-case. Presumably SHMEM does as well. Are there other popular libraries that we can list as having this use-case?

jjhursey · 2019-06-05T15:51:31Z

I updated the description with a few of the specific job/process level information items. I'll take the WIP tag off this so we can have a discussion and further expand on this.

jjhursey · 2019-06-05T16:53:10Z

For a list of the job/process level see section 11.1.3 (PMIx_server_register_nspace) for a list. Document Link

Suggestion from the teleconf:

Add a marking for optional vs required. For each, a small explanation, especially for the optional values, explain the use case for it and the implications (e.g., performance or design/workaround) if it is not provided.

kathrynmohror · 2019-06-06T14:31:46Z

There are lots of other use cases for this functionality, e.g. tools (I/O middleware, performance tools, etc.), programming model runtimes (MPI of course and others), probably anything that doesn't rely on MPI and wants to bootstrap communication.

This is kind of a meta point: Do we want to write all of these use cases wrt to how it is done in the current version of PMIx (i.e. naming the API calls to use), or do we want to simply map out the general functionality needed by the use case (independent of what PMIx does)? I thought we were doing the latter since in some cases we may be describing use cases for which there is no interface yet.

jjhursey · 2019-06-06T18:57:22Z

The goal of the use case is to present a description of a problem with guidance on how it might be solved in PMIx. The author can describe it in a broad sense or with specifics about interfaces they think would be helpful from PMIx.

The PMIx community then can help the author link it to existing PMIx interfaces that might address the use case. If new interfaces are needed then they can be worked out.

The idea was to have a low bar for suggesting use cases and engaging with the community (no need to be a PMIx expert). Then the PMIx community can engage to help see how it fits into what is currently defined by PMIx, and what might need to be defined still.

SteVwonder · 2019-06-06T19:23:31Z

Notes from the meeting yesterday relevant to this issue:

We may want to add a sub use-case under bootstrap for cloud services which require the exchange of authentication information.
It would be useful to add links to projects that have this use-case and what interfaces/attrs/keys they leverage. In the case of this use-case, OpenMPI and Spectrum MPI were mentioned.

This is kind of a meta point: Do we want to write all of these use cases wrt to how it is done in the current version of PMIx (i.e. naming the API calls to use), or do we want to simply map out the general functionality needed by the use case (independent of what PMIx does)?

I'm not sure exactly what we want the final form of the use-cases to look like, but I think the most useful would be both of what you describe: the general functionality/use-case and the specifics for each sub use-case (what does MPI require, what does a debugger require, what does a kublet (or other cloud technology) require, etc). If a particular sub use-case is exactly the same as another (which may be the case for MPI and SHMEM, idk), then that can be called out too.

SteVwonder · 2020-01-23T02:20:19Z

Updated the original post with a snapshot of the use-case from our Google Drive drafts folder: https://drive.google.com/open?id=1eN7aBxyzPD0a_GJFq1KH2ZHpoONj76op

artpol84 · 2020-01-23T20:16:08Z

PMIx performance tool:

Tool directory: https://github.com/openpmix/openpmix/tree/master/contrib/perf_tools
PMIx implementation: https://github.com/openpmix/openpmix/blob/master/contrib/perf_tools/pmix.c

artpol84 · 2020-01-23T20:16:26Z

Above is as an example of business card exchange

jjhursey · 2020-03-02T23:01:07Z

I updated the Google Drive version of this use case to make a clearer distinction of the role of PMIx_Fence and PMIx_Get in the Direct Business Card Exchange and Full Business Card Exchange.

The new text starts in the paragraph starting with:

After the completion of the PMIx_Commit

Let me know what you think.

SteVwonder · 2020-03-11T03:54:15Z

Thanks @jjhursey! I left a comment in the google drive version too, but for anyone having trouble accessing that, my question was w.r.t. this sentence:

The PMIx_Fence operation takes an additional parameter indicating that a full exchange of business card information is requested of the runtime environment.

What is the "runtime environment" in this scenario? I presume the PMIx server, but I'm not sure.

jjhursey · 2020-03-16T19:57:59Z

I updated the google drive version to clarify that it is the RM that it is talking about.

jjhursey · 2023-01-12T15:57:40Z

The v5.0.x PR #328 included this use case. The issue will remain open for further discussion on this topic.

jjhursey added WorkInProgress Work In Progress Use Case Description of a Use Case labels Jun 1, 2019

jjhursey removed the WorkInProgress Work In Progress label Jun 5, 2019

SteVwonder mentioned this issue Jan 14, 2021

Add appendix with use case descriptions #328

Merged

jjhursey mentioned this issue Jan 12, 2023

Creating PMIx "classes/slices" based on functionality #182

Closed

jjhursey added this to the PMIx v5 Standard milestone Jan 12, 2023

jjhursey self-assigned this Jan 12, 2023

jjhursey removed their assignment Jan 12, 2023

pmix locked and limited conversation to collaborators Feb 9, 2023

raffenet converted this issue into discussion #438 Feb 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Business Card Exchange for Process-to-Process Wire-up #191

Business Card Exchange for Process-to-Process Wire-up #191

jjhursey commented Jun 1, 2019 •

edited by SteVwonder

Loading

jjhursey commented Jun 1, 2019

SteVwonder commented Jun 5, 2019

jjhursey commented Jun 5, 2019

jjhursey commented Jun 5, 2019

kathrynmohror commented Jun 6, 2019

jjhursey commented Jun 6, 2019

SteVwonder commented Jun 6, 2019

SteVwonder commented Jan 23, 2020

artpol84 commented Jan 23, 2020

artpol84 commented Jan 23, 2020

jjhursey commented Mar 2, 2020

SteVwonder commented Mar 11, 2020

jjhursey commented Mar 16, 2020

jjhursey commented Jan 12, 2023

This issue was moved to a discussion.

This issue was moved to a discussion.

Business Card Exchange for Process-to-Process Wire-up #191

Business Card Exchange for Process-to-Process Wire-up #191

Comments

jjhursey commented Jun 1, 2019 • edited by SteVwonder Loading

Brief Description

Use Case Details

Interfaces

Keys

References

jjhursey commented Jun 1, 2019

SteVwonder commented Jun 5, 2019

jjhursey commented Jun 5, 2019

jjhursey commented Jun 5, 2019

kathrynmohror commented Jun 6, 2019

jjhursey commented Jun 6, 2019

SteVwonder commented Jun 6, 2019

SteVwonder commented Jan 23, 2020

artpol84 commented Jan 23, 2020

artpol84 commented Jan 23, 2020

jjhursey commented Mar 2, 2020

SteVwonder commented Mar 11, 2020

jjhursey commented Mar 16, 2020

jjhursey commented Jan 12, 2023

This issue was moved to a discussion.

jjhursey commented Jun 1, 2019 •

edited by SteVwonder

Loading