You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I don't know if this is an actual error (that the user should be informed about) or a "it's ok, we'll just ignore it and move on to the next interface in the list" kind of issue (that could probably only be reported with a high enough verbosity). Regardless, "Socket closed" is probably not enough of a detailed message to convey meaningful information to the end user. 😄
I don't know if this is related to #3035 or #5818, but it's worth cross-referencing them here.
The text was updated successfully, but these errors were encountered:
#5892 is a workaround for this problem. This is unrelated to #3035 or #5818 and only happens on master.
Jeff actually got the code slightly wrong. The issue is on the next line down, where the BTL calls back to the PML with the disconnect. OB1 just aborts the job on an error callback, but in this case, the callback is fired because readv() returned 0 because the other side was in MPI_FINALIZE and shut down the socket.
This reverts commit 6acebc4.
This patch is causing numerous "Socket closed" messages which are
causing most of the failures on Cisco's MTT run. See
open-mpi#5849 for more information.
Signed-off-by: Brian Barrett <[email protected]>
Based on MTT results, da1189d (which reverted 6acebc4) fixed the test failures. Since there are open issues for all the other issues we're seeing, I'm going to close this ticket.
Cisco MTT is getting a lot of "Socket closed" messages from the TCP BTL.
For example: https://mtt.open-mpi.org/index.php?do_redir=2762
That message appears to come from here:
ompi/opal/mca/btl/tcp/btl_tcp_endpoint.c
Lines 542 to 558 in 5f1c940
I don't know if this is an actual error (that the user should be informed about) or a "it's ok, we'll just ignore it and move on to the next interface in the list" kind of issue (that could probably only be reported with a high enough verbosity). Regardless, "Socket closed" is probably not enough of a detailed message to convey meaningful information to the end user. 😄
I don't know if this is related to #3035 or #5818, but it's worth cross-referencing them here.
The text was updated successfully, but these errors were encountered: