-
Notifications
You must be signed in to change notification settings - Fork 882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osc/pt2pt fails with multiple threads #2614
Comments
As a workaround for issue open-mpi#2614 for the v2.0.2 release, do not allow for selection of the OSC PT2PT when creating an MPI RMA window. Print a hopefully helpful message and return an not-supported error. This PR should be reverted once a fix for open-mpi#2614 is in place. Signed-off-by: Howard Pritchard <[email protected]>
I can't get this to hang with osc/rdma. @markalle Was the 1sided.c test hanging with osc/rdma or just osc/pt2pt? |
This commit fixes a bug in the timer check. When -fPIC is used we need to save/restore ebx. The code copied from patcher was meant for 32-bit systems and did not work correctly on 64-bit systems. This commit updates the save/restore to use rbx instead of ebx. Fixes open-mpi#2614 Signed-off-by: Nathan Hjelm <[email protected]>
Opps. Wrong bug :). Deleting those. |
Ok, I can reproduce the issue with osc/pt2pt. Looks like something is still not right with PSCW. Taking a look now. |
Found the bug. Its an artifact of the original design. I haven't had the time to move the counters into the sync object. Think I have a workaround. Testing it now. |
Ok, definitely fixed. Running it through the tests a couple more times to shake out any remaining bugs. Will have a PR open for master, v2.x, and v2.0.3. |
Still have one stubborn bug holding up the fix. It now hangs about 10% of the time. Will hopefully have this resolved today. |
@hjelmn can this issue be closed? |
I had been meaning to re-evaluate this one for a while. I built a vanilla OMPI v2.0.x with --enable-mpi-thread-multiple, and my results were: mpirun -mca osc rdma -host hostA:2 ... : passed |
Finally found the time to track this down. I see what is going wrong in osc/pt2pt. I should have a fix ready for testing tomorrow. |
Damn this is a nasty bug. Its getting a lot further but now I am running into another issue. Will keep cranking away at it until I know what the root cause is. Will start a parallel effort to ensure osc/rdma passes next month. |
@hjelmn, Is this issue specific to osc_pt2pt or osc progress / multithreaded in general and only tickled by osc_pt2pt? |
Per discussion with @hjelmn and @hppritcha on 26 Mar 2018:
|
As a workaround for issue open-mpi#2614 for the v2.0.2 release, do not allow for selection of the OSC PT2PT when creating an MPI RMA window. Print a hopefully helpful message and return an not-supported error. This PR should be reverted once a fix for open-mpi#2614 is in place. Signed-off-by: Howard Pritchard <[email protected]> (cherry picked from commit d0ffd66) The original commit message is shown above. Followup: as of this writing (26 Mar 2018), we do not plan to fix this issue for the v2.0.x or v2.x. Hence, the osc/pt2pt component will continue to disable itself in THREAD_MULTIPLE scenarios for the life of all v2.x series. It is possible (likely?) that this will be fixed in a v3.0.x release (where x>1).
As a workaround for issue open-mpi#2614 for the v2.0.2 release, do not allow for selection of the OSC PT2PT when creating an MPI RMA window. Print a hopefully helpful message and return an not-supported error. This PR should be reverted once a fix for open-mpi#2614 is in place. Signed-off-by: Howard Pritchard <[email protected]> (cherry picked from commit d0ffd66) The original commit message is shown above. Followup: as of this writing (26 Mar 2018), we do not plan to fix this issue for the v2.0.x or v2.x. Hence, the osc/pt2pt component will continue to disable itself in THREAD_MULTIPLE scenarios for the life of all v2.x series. It is possible (likely?) that this will be fixed in a v3.0.x release (where x>1).
Per discussion 2018-05-29: @hjelmn says that a better solution would be to get everything to support osc/rdma. E.g., get vader, TCP, and the upcoming OFI BTLs to support RDMA, which then works with osc/rdma. So here's the way forward:
|
As a workaround for issue open-mpi#2614 for the v2.0.2 release, do not allow for selection of the OSC PT2PT when creating an MPI RMA window. Print a hopefully helpful message and return an not-supported error. This PR should be reverted once a fix for open-mpi#2614 is in place. Signed-off-by: Howard Pritchard <[email protected]> (cherry picked from commit d0ffd66)
Per discussion at open-mpi#2614 (comment), do not allow for selection of the OSC PT2PT when creating an MPI RMA window when THREAD_MULTIPLE is active. Print a helpful message and return a not-supported error. Signed-off-by: Howard Pritchard <[email protected]> Signed-off-by: Jeff Squyres <[email protected]> (cherry picked from commit d0ffd66)
Per discussion at open-mpi#2614 (comment), do not allow for selection of the OSC PT2PT when creating an MPI RMA window when THREAD_MULTIPLE is active. Print a helpful message and return a not-supported error. Signed-off-by: Howard Pritchard <[email protected]> Signed-off-by: Jeff Squyres <[email protected]> (cherry picked from commit d0ffd66) Signed-off-by: Jeff Squyres <[email protected]>
Currently psm and psm2 MTLs use the osc/pt2pt understanding the limitations. There is a workaround using the OFI path, but still a workaround. |
@matcabral I think @hjelmn's plan is to make all the BTL's support put/get, and then all transports can use osc/rdma -- therefore the need for osc/pt2pt can go away. That would be the purpose of the OFI BTL -- just for one-sided. Make sense? |
@jsquyres, yes, thanks for the clarification. |
I expect all transports that do not currently work with osc/rdma will see a performance improvement. How large will depend on the AMO and RDMA implementations. |
Per discussion at open-mpi#2614 (comment), do not allow for selection of the OSC PT2PT when creating an MPI RMA window when THREAD_MULTIPLE is active. Print a helpful message and return a not-supported error. Signed-off-by: Howard Pritchard <[email protected]> Signed-off-by: Jeff Squyres <[email protected]> (cherry picked from commit d0ffd66) Signed-off-by: Jeff Squyres <[email protected]> (cherry picked from commit 5b7c866)
Per discussion at open-mpi#2614 (comment), do not allow for selection of the OSC PT2PT when creating an MPI RMA window when THREAD_MULTIPLE is active. Print a helpful message and return a not-supported error. Signed-off-by: Howard Pritchard <[email protected]> Signed-off-by: Jeff Squyres <[email protected]> (cherry picked from commit d0ffd66) Signed-off-by: Jeff Squyres <[email protected]> (cherry picked from commit 5b7c866)
Per discussion at #2614 (comment), do not allow for selection of the OSC PT2PT when creating an MPI RMA window when THREAD_MULTIPLE is active. Print a helpful message and return a not-supported error. Signed-off-by: Howard Pritchard <[email protected]> Signed-off-by: Jeff Squyres <[email protected]> (cherry picked from commit d0ffd66) Signed-off-by: Jeff Squyres <[email protected]> (cherry picked from commit 5b7c866)
Testing with the
osc/pt2pt
component revealed multiple hangs and wrong answers when running with two threads. Each thread is working with it's own communicator copy ofMPI_COMM_WORLD
and their own private windows.The test is here:
PR #2630 will need to be reverted when a resolution to this issue is committed on v2.0.x branch.
The text was updated successfully, but these errors were encountered: