-
Notifications
You must be signed in to change notification settings - Fork 884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enabled COMM_TYPE_SPLIT dependent on locality #326
Conversation
Refer to this link for build results (access rights to CI server needed): |
Using the underlying hardware identification to split communicators based on locality has been enabled using the MPI_Comm_Split_Type function. Currently implemented split's are: HWTHREAD CORE L1CACHE L2CACHE L3CACHE SOCKET NUMA NODE BOARD HOST CU CLUSTER However only NODE is defined in the standard which is why the remaning splits are referred to using the OMPI_ prefix instead of the standard MPI_ prefix. I have tested this using --without-hwloc and --with-hwloc=<path> which both give the same output. NOTE: I think something fishy is going on in the locality operators. In my test-program I couldn't get the correct split on these requests: NUMA, SOCKET, L3CACHE where I suspected a full communicator but only got one.
When you have a chance, could you add a simple test of this feature in ompi-tests/ibm/random, or add a link to one in this PR and I'll add it to ompi-tests. |
Enabled COMM_TYPE_SPLIT dependent on locality
@hppritcha I have tried searching for the ompi-tests repo, but to no avail? The link will only be temporarily available (for a year or so ;) ) @ggouaillardet thanks for the ABI correction.
However, I had expected (only these changes):
The same thing happens if I bind/oversubscribe etc. From your comment I gathered that NODE,L3 should show the same thing? |
@zerothi did you update ompi master with the latest changes and rebuild both omi and your test program ? |
@ggouaillardet Ah, sorry, I misinterpreted 9e9261e message, will re-test immediately. |
@ggouaillardet I have just retested with both hwloc and without. First, there is an error in the ABI commit 24df0ed. The diff is here (the TYPE_NODE is defined just below the
Ok, so then the output for both hwloc and without hwloc:
I have checked that my linked libraries are correct, and the output is the same. |
Why don’t you add —report-bindings to your cmd line and let’s see where the procs are actually being bound? It would also help to see your actual cmd line.
|
Sorry, my mistake. That was entirely the reason.
2014-12-27 15:38 GMT+00:00 rhc54 [email protected]:
--
|
And the cmd-line 2014-12-27 16:11 GMT+00:00 Nick Papior Andersen [email protected]:
Kind regards Nick |
@zerothi Nick -- we were talking about this code at the Open MPI face-to-face dev meeting this week.
|
Dear all, I am happy that you consider it valuable enough to enter the OpenMPI repo.
2015-01-27 18:25 GMT+00:00 Jeff Squyres [email protected]:
Kind regards Nick |
@zerothi Thank you! |
Correct the way we handle binding to objects during comm_spawn
Fix a segfault when generic MCA params are given
Using the underlying hardware identification to split
communicators based on locality has been enabled using
the MPI_Comm_Split_Type function.
Currently implemented split's are:
HWTHREAD
CORE
L1CACHE
L2CACHE
L3CACHE
SOCKET
NUMA
NODE
BOARD
HOST
CU
CLUSTER
However only NODE is defined in the standard which is why the
remaning splits are referred to using the OMPI_ prefix instead
of the standard MPI_ prefix.
@hppritcha , @ggouaillardet I have tested this using --without-hwloc and --with-hwloc=
which both give the same output.
@ggouaillardet I tried moving the constants to the mpi-ext.h, however I got into troubles when compiling as the generic code then relied on mpi-ext.h. To me it seemed like a lot of OMPI_* constants which only has meaning for OpenMPI is still placed in mpi.h. Hence I have not moved the constants to mpi-ext.h.
NOTE: I think something fishy is going on in the locality operators.
In my test-program I couldn't get the correct split on these requests:
NUMA, SOCKET, L3CACHE
where I suspected a full communicator but only got one.