-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SST: unexpected connection close event
on Summit
#3269
Comments
Thanks for the report. Unfortunately it can be very difficult to sort out the proximate cause for this sort of thing, even with full verbosity. I'll be trying to duplicate the failure scenario myself to see if I can sort it out. |
Hi @eisenhauer. You're welcome. On a RHEL7 workstation using ADIOS 2.8.0.84 the server/writer process writes the |
Adding calls to engine
|
Ah yes, that will do it... |
@eisenhauer Is there any chance the created/opened Engines can be closed by the Engine or IO destructor? IIUC, the IO and adios objects are RAII and having the Engine clean up after itself would be nice. |
That's probably worth talking about. Currently I don't think any of the engines have anything other than the default destructor. Mostly that works out OK for the file engines: open FDs will get cleaned up and already-written data flushed, and there are no special measure that one has to take to 'finalize' the open files. SST is a different beast though, with a lot of interaction between the readers and writers. Clean shutdown is a complicated dance. But perhaps we can simply call Close() in the destructor. My worry would be that, for example for writer engines, Close() can take a long time because it might be waiting on its peer to consume all the data. Close also involved MPI collective operations (to make sure that no rank goes away before all the others). I'm not sure how I feel about that being embedded in a destructor. But at a minimum we could throw a warning or an exception if the destructor is called without having called Close() first. |
Describe the bug
The following repo has an example that creates IOs and Engines for sending and receiving data between a server and client.
https://github.com/SCOREC/adios2SstTest/tree/6965c970f67b7404a70daa0b7dd12f49f0862521
Following the build and run instructions in the README results in the following output from the job using SST:
Note, the repo contains the output logs from a run with
SstVerbose=5
.I have an application that uses this IO and Engine setup logic. It appears to complete three data exchange iterations but at the end of execution produces the
unexpected connection close event
andSST stream open at exit
outputs as seen above in the simple example. I'm concerned that there is an underlying problem that may cause failures in larger/longer runs and would like to fix it.To Reproduce
See https://github.com/SCOREC/adios2SstTest/blob/6965c970f67b7404a70daa0b7dd12f49f0862521/README.md
Expected behavior
I expect there to be no warnings/errors when running with SST.
System and Environment
gcc/10.2.0
system modulecmake/3.21.3
system moduleadios2/2.7.1
system moduleAdditional context
None
Following up
This was an ID10T error. I was missing the calls to Engine
Close()
.The text was updated successfully, but these errors were encountered: