Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gaspi_proc_term sometimes hangs after otherwise successful program execution #44

Open
krzikalla opened this issue Jun 13, 2018 · 2 comments

Comments

@krzikalla
Copy link

Configuration: hybrid MPI/GASPI application, TCP device, four processes on one node (hence local communication only)

Sometimes during gaspi_proc_term some processes hang in pthread_join (tcp_dev_stop_device).
The tcp_virt_dev thread (for which the join waits) hangs in a read (can't say which one).

@krzikalla krzikalla changed the title gaspi_proc_term sometimes hangs after successful program execution gaspi_proc_term sometimes hangs after otherwise successful program execution Jun 13, 2018
@krzikalla
Copy link
Author

Further investigation: it happens at tcp_device.c:1351. The read blocks and waits until data is available. However, for some reason (unknown to me) there seems to be no more data. Some processes are already done with gaspi_proc_term (apparently always proc 0, sometimes also some other), but some others are hanging.

@rumach
Copy link
Member

rumach commented Sep 20, 2019

Hi there. Do you have a simple program which reproduces this behaviour, more or less, consistently?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants