Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabled COMM_TYPE_SPLIT dependent on locality #326

Merged
merged 1 commit into from
Dec 24, 2014
Merged

Enabled COMM_TYPE_SPLIT dependent on locality #326

merged 1 commit into from
Dec 24, 2014

Conversation

zerothi
Copy link
Contributor

@zerothi zerothi commented Dec 24, 2014

Using the underlying hardware identification to split
communicators based on locality has been enabled using
the MPI_Comm_Split_Type function.

Currently implemented split's are:
HWTHREAD
CORE
L1CACHE
L2CACHE
L3CACHE
SOCKET
NUMA
NODE
BOARD
HOST
CU
CLUSTER

However only NODE is defined in the standard which is why the
remaning splits are referred to using the OMPI_ prefix instead
of the standard MPI_ prefix.

@hppritcha , @ggouaillardet I have tested this using --without-hwloc and --with-hwloc=
which both give the same output.

@ggouaillardet I tried moving the constants to the mpi-ext.h, however I got into troubles when compiling as the generic code then relied on mpi-ext.h. To me it seemed like a lot of OMPI_* constants which only has meaning for OpenMPI is still placed in mpi.h. Hence I have not moved the constants to mpi-ext.h.

NOTE: I think something fishy is going on in the locality operators.
In my test-program I couldn't get the correct split on these requests:
NUMA, SOCKET, L3CACHE
where I suspected a full communicator but only got one.

@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/153/
Test PASSed.

Using the underlying hardware identification to split
communicators based on locality has been enabled using
the MPI_Comm_Split_Type function.

Currently implemented split's are:
  HWTHREAD
  CORE
  L1CACHE
  L2CACHE
  L3CACHE
  SOCKET
  NUMA
  NODE
  BOARD
  HOST
  CU
  CLUSTER

However only NODE is defined in the standard which is why the
remaning splits are referred to using the OMPI_ prefix instead
of the standard MPI_ prefix.

I have tested this using --without-hwloc and --with-hwloc=<path>
which both give the same output.

NOTE: I think something fishy is going on in the locality operators.
In my test-program I couldn't get the correct split on these requests:
  NUMA, SOCKET, L3CACHE
where I suspected a full communicator but only got one.
@hppritcha
Copy link
Member

When you have a chance, could you add a simple test of this feature in ompi-tests/ibm/random, or add a link to one in this PR and I'll add it to ompi-tests.

@hppritcha hppritcha self-assigned this Dec 24, 2014
hppritcha added a commit that referenced this pull request Dec 24, 2014
Enabled COMM_TYPE_SPLIT dependent on locality
@hppritcha hppritcha merged commit 65c4f8d into open-mpi:master Dec 24, 2014
@ggouaillardet
Copy link
Contributor

@zerothi @rhc54 i just pushed 9e9261e in order to fix this issue (pmix specific)

@zerothi
Copy link
Contributor Author

zerothi commented Dec 26, 2014

@hppritcha I have tried searching for the ompi-tests repo, but to no avail?
I have added a link to a fortran file which creates a communicator based on all split types.
comm_split.f90
I am not proficient in C, hence the fortran test.

The link will only be temporarily available (for a year or so ;) )

@ggouaillardet thanks for the ABI correction.
@ggouaillardet I have tried the linked code on my local computer (with 2 cores, 4 threads, i7-2640M).
On that I get this print-out:

Comm using Cluster      Node:   0 local rank:   0 out of   2 ranks
Comm using CU           Node:   0 local rank:   0 out of   2 ranks
Comm using Host         Node:   0 local rank:   0 out of   2 ranks
Comm using Board        Node:   0 local rank:   0 out of   2 ranks
Comm using Node         Node:   0 local rank:   0 out of   2 ranks
Comm using Shared       Node:   0 local rank:   0 out of   2 ranks
Comm using Numa         Node:   0 local rank:   0 out of   1 ranks
Comm using Socket       Node:   0 local rank:   0 out of   1 ranks
Comm using L3           Node:   0 local rank:   0 out of   1 ranks
Comm using L2           Node:   0 local rank:   0 out of   1 ranks
Comm using L1           Node:   0 local rank:   0 out of   1 ranks
Comm using Core         Node:   0 local rank:   0 out of   1 ranks
Comm using HW           Node:   0 local rank:   0 out of   1 ranks

However, I had expected (only these changes):

Comm using Numa         Node:   0 local rank:   0 out of   2 ranks
Comm using Socket       Node:   0 local rank:   0 out of   2 ranks
Comm using L3           Node:   0 local rank:   0 out of   2 ranks
Comm using L2           Node:   0 local rank:   0 out of   2 ranks

The same thing happens if I bind/oversubscribe etc. From your comment I gathered that NODE,L3 should show the same thing?
Anyway, the above is what I would suspect to be the output.

@ggouaillardet
Copy link
Contributor

@zerothi did you update ompi master with the latest changes and rebuild both omi and your test program ?

@zerothi
Copy link
Contributor Author

zerothi commented Dec 27, 2014

@ggouaillardet Ah, sorry, I misinterpreted 9e9261e message, will re-test immediately.

@zerothi
Copy link
Contributor Author

zerothi commented Dec 27, 2014

@ggouaillardet I have just retested with both hwloc and without. First, there is an error in the ABI commit 24df0ed.

The diff is here (the TYPE_NODE is defined just below the enum, and should not be in the enum construct):

diff --git a/ompi/include/mpi.h.in b/ompi/include/mpi.h.in
index 22a53e4..1c5194d 100644
--- a/ompi/include/mpi.h.in
+++ b/ompi/include/mpi.h.in
@@ -675,7 +675,6 @@ enum {
   OMPI_COMM_TYPE_L3CACHE,
   OMPI_COMM_TYPE_SOCKET,
   OMPI_COMM_TYPE_NUMA,
-  OMPI_COMM_TYPE_NODE,
   OMPI_COMM_TYPE_BOARD,
   OMPI_COMM_TYPE_HOST,
   OMPI_COMM_TYPE_CU,

Ok, so then the output for both hwloc and without hwloc:

Example of MPI_Comm_Split_Type

Currently using 4 nodes.

Comm using Cluster      Node:   0 local rank:   0 out of   4 ranks
Comm using CU           Node:   0 local rank:   0 out of   4 ranks
Comm using Host         Node:   0 local rank:   0 out of   4 ranks
Comm using Board        Node:   0 local rank:   0 out of   4 ranks
Comm using Node         Node:   0 local rank:   0 out of   4 ranks
Comm using Shared       Node:   0 local rank:   0 out of   4 ranks
Comm using Numa         Node:   0 local rank:   0 out of   1 ranks
Comm using Socket       Node:   0 local rank:   0 out of   1 ranks
Comm using L3           Node:   0 local rank:   0 out of   1 ranks
Comm using L2           Node:   0 local rank:   0 out of   1 ranks
Comm using L1           Node:   0 local rank:   0 out of   1 ranks
Comm using Core         Node:   0 local rank:   0 out of   1 ranks
Comm using HW           Node:   0 local rank:   0 out of   1 ranks

I have checked that my linked libraries are correct, and the output is the same.
Maybe I should say that I am linking against hwloc-1.10.0 and not the shipped hwloc version...
Any advice?

@rhc54
Copy link
Contributor

rhc54 commented Dec 27, 2014

Why don’t you add —report-bindings to your cmd line and let’s see where the procs are actually being bound? It would also help to see your actual cmd line.

On Dec 27, 2014, at 2:22 AM, Nick Papior Andersen [email protected] wrote:

@ggouaillardet https://github.com/ggouaillardet I have just retested with both hwloc and without. First, there is an error in the ABI commit 24df0ed 24df0ed.

The diff is here (the TYPE_NODE is defined just below the enum, and should not be in the enum construct):

diff --git a/ompi/include/mpi.h.in b/ompi/include/mpi.h.in
index 22a53e4..1c5194d 100644
--- a/ompi/include/mpi.h.in
+++ b/ompi/include/mpi.h.in
@@ -675,7 +675,6 @@ enum {
OMPI_COMM_TYPE_L3CACHE,
OMPI_COMM_TYPE_SOCKET,
OMPI_COMM_TYPE_NUMA,

  • OMPI_COMM_TYPE_NODE,
    OMPI_COMM_TYPE_BOARD,
    OMPI_COMM_TYPE_HOST,
    OMPI_COMM_TYPE_CU,
    Ok, so then the output for both hwloc and without hwloc:

Example of MPI_Comm_Split_Type

Currently using 4 nodes.

Comm using Cluster Node: 0 local rank: 0 out of 4 ranks
Comm using CU Node: 0 local rank: 0 out of 4 ranks
Comm using Host Node: 0 local rank: 0 out of 4 ranks
Comm using Board Node: 0 local rank: 0 out of 4 ranks
Comm using Node Node: 0 local rank: 0 out of 4 ranks
Comm using Shared Node: 0 local rank: 0 out of 4 ranks
Comm using Numa Node: 0 local rank: 0 out of 1 ranks
Comm using Socket Node: 0 local rank: 0 out of 1 ranks
Comm using L3 Node: 0 local rank: 0 out of 1 ranks
Comm using L2 Node: 0 local rank: 0 out of 1 ranks
Comm using L1 Node: 0 local rank: 0 out of 1 ranks
Comm using Core Node: 0 local rank: 0 out of 1 ranks
Comm using HW Node: 0 local rank: 0 out of 1 ranks
I have checked that my linked libraries are correct, and the output is the same.

Maybe I should say that I am linking against hwloc-1.10.0 and not the shipped hwloc version...
Any advice?


Reply to this email directly or view it on GitHub #326 (comment).

@zerothi
Copy link
Contributor Author

zerothi commented Dec 27, 2014

Sorry, my mistake. That was entirely the reason.
Binding to core creates the correct splitting! Thanks.
(the reason for the 4 ranks was over subscription)

[ntch-l0071:15610] MCW rank 1 bound to socket 0[core 1[hwt 0-1]]: [../BB]
Example of MPI_Comm_Split_Type

Currently using 2 nodes.

Comm using Cluster      Node:   0 local rank:   0 out of   2 ranks
Comm using CU           Node:   0 local rank:   0 out of   2 ranks
Comm using Host         Node:   0 local rank:   0 out of   2 ranks
Comm using Board        Node:   0 local rank:   0 out of   2 ranks
Comm using Node         Node:   0 local rank:   0 out of   2 ranks
Comm using Shared       Node:   0 local rank:   0 out of   2 ranks
Comm using Numa         Node:   0 local rank:   0 out of   2 ranks
Comm using Socket       Node:   0 local rank:   0 out of   2 ranks
Comm using L3           Node:   0 local rank:   0 out of   2 ranks
Comm using L2           Node:   0 local rank:   0 out of   1 ranks
Comm using L1           Node:   0 local rank:   0 out of   1 ranks
Comm using Core         Node:   0 local rank:   0 out of   1 ranks
Comm using HW           Node:   0 local rank:   0 out of   1 ranks

2014-12-27 15:38 GMT+00:00 rhc54 [email protected]:

Why don’t you add —report-bindings to your cmd line and let’s see where
the procs are actually being bound? It would also help to see your actual
cmd line.

On Dec 27, 2014, at 2:22 AM, Nick Papior Andersen <
[email protected]> wrote:

@ggouaillardet https://github.com/ggouaillardet I have just retested
with both hwloc and without. First, there is an error in the ABI commit
24df0ed <
24df0ed039510cca3801f7d2b1315e48982b4a6b>.

The diff is here (the TYPE_NODE is defined just below the enum, and
should not be in the enum construct):

diff --git a/ompi/include/mpi.h.in b/ompi/include/mpi.h.in
index 22a53e4..1c5194d 100644
--- a/ompi/include/mpi.h.in
+++ b/ompi/include/mpi.h.in
@@ -675,7 +675,6 @@ enum {
OMPI_COMM_TYPE_L3CACHE,
OMPI_COMM_TYPE_SOCKET,
OMPI_COMM_TYPE_NUMA,

  • OMPI_COMM_TYPE_NODE,
    OMPI_COMM_TYPE_BOARD,
    OMPI_COMM_TYPE_HOST,
    OMPI_COMM_TYPE_CU,
    Ok, so then the output for both hwloc and without hwloc:

Example of MPI_Comm_Split_Type

Currently using 4 nodes.

Comm using Cluster Node: 0 local rank: 0 out of 4 ranks
Comm using CU Node: 0 local rank: 0 out of 4 ranks
Comm using Host Node: 0 local rank: 0 out of 4 ranks
Comm using Board Node: 0 local rank: 0 out of 4 ranks
Comm using Node Node: 0 local rank: 0 out of 4 ranks
Comm using Shared Node: 0 local rank: 0 out of 4 ranks
Comm using Numa Node: 0 local rank: 0 out of 1 ranks
Comm using Socket Node: 0 local rank: 0 out of 1 ranks
Comm using L3 Node: 0 local rank: 0 out of 1 ranks
Comm using L2 Node: 0 local rank: 0 out of 1 ranks
Comm using L1 Node: 0 local rank: 0 out of 1 ranks
Comm using Core Node: 0 local rank: 0 out of 1 ranks
Comm using HW Node: 0 local rank: 0 out of 1 ranks
I have checked that my linked libraries are correct, and the output is
the same.

Maybe I should say that I am linking against hwloc-1.10.0 and not the
shipped hwloc version...
Any advice?


Reply to this email directly or view it on GitHub <
https://github.com/open-mpi/ompi/pull/326#issuecomment-68174654>.


Reply to this email directly or view it on GitHub
#326 (comment).

--
Kind regards Nick

@zerothi
Copy link
Contributor Author

zerothi commented Dec 27, 2014

And the cmd-line mpirun -np 2 --bind-to core --report-bindings ./hwloc_comm_split

2014-12-27 16:11 GMT+00:00 Nick Papior Andersen [email protected]:

Sorry, my mistake. That was entirely the reason.
Binding to core creates the correct splitting! Thanks.
(the reason for the 4 ranks was over subscription)

[BB/..]
[ntch-l0071:15610] MCW rank 1 bound to socket 0[core 1[hwt 0-1]]: [../BB]
Example of MPI_Comm_Split_Type

Currently using 2 nodes.

Comm using Cluster      Node:   0 local rank:   0 out of   2 ranks
Comm using CU           Node:   0 local rank:   0 out of   2 ranks
Comm using Host         Node:   0 local rank:   0 out of   2 ranks
Comm using Board        Node:   0 local rank:   0 out of   2 ranks
Comm using Node         Node:   0 local rank:   0 out of   2 ranks
Comm using Shared       Node:   0 local rank:   0 out of   2 ranks
Comm using Numa         Node:   0 local rank:   0 out of   2 ranks
Comm using Socket       Node:   0 local rank:   0 out of   2 ranks
Comm using L3           Node:   0 local rank:   0 out of   2 ranks
Comm using L2           Node:   0 local rank:   0 out of   1 ranks
Comm using L1           Node:   0 local rank:   0 out of   1 ranks
Comm using Core         Node:   0 local rank:   0 out of   1 ranks
Comm using HW           Node:   0 local rank:   0 out of   1 ranks

2014-12-27 15:38 GMT+00:00 rhc54 [email protected]:

Why don’t you add —report-bindings to your cmd line and let’s see where
the procs are actually being bound? It would also help to see your actual
cmd line.

On Dec 27, 2014, at 2:22 AM, Nick Papior Andersen <
[email protected]> wrote:

@ggouaillardet https://github.com/ggouaillardet I have just retested
with both hwloc and without. First, there is an error in the ABI commit
24df0ed <
24df0ed039510cca3801f7d2b1315e48982b4a6b>.

The diff is here (the TYPE_NODE is defined just below the enum, and
should not be in the enum construct):

diff --git a/ompi/include/mpi.h.in b/ompi/include/mpi.h.in
index 22a53e4..1c5194d 100644
--- a/ompi/include/mpi.h.in
+++ b/ompi/include/mpi.h.in
@@ -675,7 +675,6 @@ enum {
OMPI_COMM_TYPE_L3CACHE,
OMPI_COMM_TYPE_SOCKET,
OMPI_COMM_TYPE_NUMA,

  • OMPI_COMM_TYPE_NODE,
    OMPI_COMM_TYPE_BOARD,
    OMPI_COMM_TYPE_HOST,
    OMPI_COMM_TYPE_CU,
    Ok, so then the output for both hwloc and without hwloc:

Example of MPI_Comm_Split_Type

Currently using 4 nodes.

Comm using Cluster Node: 0 local rank: 0 out of 4 ranks
Comm using CU Node: 0 local rank: 0 out of 4 ranks
Comm using Host Node: 0 local rank: 0 out of 4 ranks
Comm using Board Node: 0 local rank: 0 out of 4 ranks
Comm using Node Node: 0 local rank: 0 out of 4 ranks
Comm using Shared Node: 0 local rank: 0 out of 4 ranks
Comm using Numa Node: 0 local rank: 0 out of 1 ranks
Comm using Socket Node: 0 local rank: 0 out of 1 ranks
Comm using L3 Node: 0 local rank: 0 out of 1 ranks
Comm using L2 Node: 0 local rank: 0 out of 1 ranks
Comm using L1 Node: 0 local rank: 0 out of 1 ranks
Comm using Core Node: 0 local rank: 0 out of 1 ranks
Comm using HW Node: 0 local rank: 0 out of 1 ranks
I have checked that my linked libraries are correct, and the output is
the same.

Maybe I should say that I am linking against hwloc-1.10.0 and not the
shipped hwloc version...
Any advice?


Reply to this email directly or view it on GitHub <
https://github.com/open-mpi/ompi/pull/326#issuecomment-68174654>.


Reply to this email directly or view it on GitHub
#326 (comment).

--
Kind regards Nick

Kind regards Nick

@jsquyres
Copy link
Member

@zerothi Nick -- we were talking about this code at the Open MPI face-to-face dev meeting this week.

  1. Since this is a "large" patch, would you mind either a) saying here in a comment that you release this code under a BSD-compatible license, or b) signing a contribution agreement (http://www.open-mpi.org/community/contribute/)? (I'm guessing/assuming that just releasing the code under a BSD license might be your easiest option)
  2. George @bosilca and I were talking about the code today, and he had some good feedback and a suggestion to make it better. He's going to ping you in email with the suggestion.

@zerothi
Copy link
Contributor Author

zerothi commented Jan 27, 2015

Dear all,

I am happy that you consider it valuable enough to enter the OpenMPI repo.

  1. Consider my contribution a part of the BSD compatible license :)
    To be clear-cut:
    I release the contributed code referenced in this mail correspondence under
    a BSD-compatible license
  2. I look forward to that :)

2015-01-27 18:25 GMT+00:00 Jeff Squyres [email protected]:

@zerothi https://github.com/zerothi Nick -- we were talking about this
code at the Open MPI face-to-face dev meeting this week.

Since this is a "large" patch, would you mind either a) saying here in
a comment that you release this code under a BSD-compatible license, or b)
signing a contribution agreement (
http://www.open-mpi.org/community/contribute/)? (I'm guessing/assuming
that just releasing the code under a BSD license might be your easiest
option)
2.

George @bosilca https://github.com/bosilca and I were talking about
the code today, and he had some good feedback and a suggestion to make it
better. He's going to ping you in email with the suggestion.


Reply to this email directly or view it on GitHub
#326 (comment).

Kind regards Nick

@jsquyres
Copy link
Member

@zerothi Thank you!

jsquyres pushed a commit to jsquyres/ompi that referenced this pull request Sep 21, 2016
Correct the way we handle binding to objects during comm_spawn
dong0321 pushed a commit to dong0321/ompi that referenced this pull request Feb 19, 2020
Fix a segfault when generic MCA params are given
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants