-
Notifications
You must be signed in to change notification settings - Fork 876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IMB tesuite fails when using vader #4260
Comments
The issue seems to be fixed in 2.1.3a1 and in master. |
@bosilca Do you have a sha1 ? I need to get 2.1.2 working for our next release (2.1.3 will be too late). |
I see the same issue with the tip of the v2.x branch |
Note: This does not happen everytime. Sometimes it stalls, sometimes it works. |
I used the current head at cb36cf9. I run the test you mentionned 100 times and couldn't get any segfault (on an x86). |
I can still see the bug on this SHA. I see it on v3.0.0 too. |
@bosilca Are you on x86 or x86_64 ? 64 bits works fine it's the 32b version that breaks. |
I haven't built the 32 bits version in ages ... |
I've never seen the issue on x86_64. The i586 has probably be broken for a while but the testsuite I run failed silently due to a glitch in some script. |
Bump. This is still happening for happening for both v2.x and v3 on x86 (32b) |
Can confirm this is a bug when running an i386 build. Taking a look now. |
Looking like this is due to a missing memory barrier. Testing the fix now. |
There were multiple paths that could lead to a fast box allocation. One of them made little sense (in-place send) so it has been removed to allow a rework of the fast-box send function. This should fix a number of issues with hanging/crashing when using the vader btl. References open-mpi#4260, open-mpi#4553. Signed-off-by: Nathan Hjelm <[email protected]>
There were multiple paths that could lead to a fast box allocation. One of them made little sense (in-place send) so it has been removed to allow a rework of the fast-box send function. This should fix a number of issues with hanging/crashing when using the vader btl. References open-mpi#4260 Signed-off-by: Nathan Hjelm <[email protected]>
PR #4569 seems to fix the issue on both openmpi 2 and 3 |
@hjelmn Thanks for that fix |
There were multiple paths that could lead to a fast box allocation. One of them made little sense (in-place send) so it has been removed to allow a rework of the fast-box send function. This should fix a number of issues with hanging/crashing when using the vader btl. References open-mpi#4260 Signed-off-by: Nathan Hjelm <[email protected]>
There were multiple paths that could lead to a fast box allocation. One of them made little sense (in-place send) so it has been removed to allow a rework of the fast-box send function. This should fix a number of issues with hanging/crashing when using the vader btl. References #4260 Signed-off-by: Nathan Hjelm <[email protected]>
There were multiple paths that could lead to a fast box allocation. One of them made little sense (in-place send) so it has been removed to allow a rework of the fast-box send function. This should fix a number of issues with hanging/crashing when using the vader btl. References open-mpi#4260 Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from commit a82f761) Signed-off-by: Nathan Hjelm <[email protected]>
There were multiple paths that could lead to a fast box allocation. One of them made little sense (in-place send) so it has been removed to allow a rework of the fast-box send function. This should fix a number of issues with hanging/crashing when using the vader btl. References open-mpi#4260 Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from commit a82f761) Signed-off-by: Nathan Hjelm <[email protected]>
There were multiple paths that could lead to a fast box allocation. One of them made little sense (in-place send) so it has been removed to allow a rework of the fast-box send function. This should fix a number of issues with hanging/crashing when using the vader btl. References open-mpi#4260 Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from commit a82f761) Signed-off-by: Nathan Hjelm <[email protected]>
Now merged on all branches. Closing. |
There were multiple paths that could lead to a fast box allocation. One of them made little sense (in-place send) so it has been removed to allow a rework of the fast-box send function. This should fix a number of issues with hanging/crashing when using the vader btl. References open-mpi#4260 Signed-off-by: Nathan Hjelm <[email protected]>
Using openmpi 2.1.2 and the Intel MPI Benchmark suite (https://software.intel.com/sites/default/files/managed/76/6c/IMB_2017_Update2.tgz) on x86 systems (multiple SUSE versions)
I get this error
while
mpirun -np 2 --mca btl sm,self /usr/lib/mpi/gcc/openmpi2/tests/IMB/IMB-MPI1
works fineTried to gdb the SEGV but no success yet.
The text was updated successfully, but these errors were encountered: