-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ponyc segfault during runtime shutdown #1608
Labels
Comments
dipinhora
added a commit
to dipinhora/ponyc
that referenced
this issue
Nov 25, 2017
Prior to this commit, the shutdown process could potentially cause the runtime to end up in an invalid state where the ASIO backend had been stopped but the shutdown of the runtime was aborted. If this occurred and any actor subsequently tried to subscribe to an ASIO event, it would cause a segfault because the ASIO backend would be `NULL` instead of an actual `asio_backend_t*`. This commit changes things so that if the shutdown is aborted due to the receipt of a new `UNBLOCK` message andthe ASIO has already been stopped, we try and restart the ASIO backend. This commit also adds some extra assertions around the use of the ASIO backend to ensure it is not `NULL` when it shouldn't be. This should resolve WallarooLabs/wally/issues/1608.
dipinhora
added a commit
to dipinhora/ponyc
that referenced
this issue
Nov 25, 2017
Prior to this commit, the shutdown process could potentially cause the runtime to end up in an invalid state where the ASIO backend had been stopped but the shutdown of the runtime was aborted. If this occurred and any actor subsequently tried to subscribe to an ASIO event, it would cause a segfault because the ASIO backend would be `NULL` instead of an actual `asio_backend_t*`. This commit changes things so that if the shutdown is aborted due to the receipt of a new `UNBLOCK` message and the ASIO backend has already been stopped, we try and restart the ASIO backend. This commit also adds some extra assertions around the use of the ASIO backend to ensure it is not `NULL` when it shouldn't be. This should resolve WallarooLabs/wally/issues/1608.
dipinhora
added a commit
to dipinhora/ponyc
that referenced
this issue
Nov 25, 2017
Prior to this commit, the shutdown process could potentially cause the runtime to end up in an invalid state where the ASIO backend had been stopped but the shutdown of the runtime was aborted. If this occurred and any actor subsequently tried to subscribe to an ASIO event, it would cause a segfault because the ASIO backend would be `NULL` instead of an actual `asio_backend_t*`. This commit changes things so that if the shutdown is aborted due to the receipt of a new `UNBLOCK` message and the ASIO backend has already been stopped, we try and restart the ASIO backend. This commit also adds some extra assertions around the use of the ASIO backend to ensure it is not `NULL` when it shouldn't be. This should resolve WallarooLabs/wally/issues/1608.
👏 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
##tldr;
Background
While addressing a flip-flopping test failure in wallaroo, where the
DataChannel
's shutdown sequence had a race condition, we set up a shell script to run thedata_channel
unit tests in a loop until they failed because of that error. This resulted in the unit test failure being reproducible within a few seconds if you ran that test loop in parallel on multiple shells.The fix involved applying two patches from mainlaine ponyc's TCPConnection to all of our copies of TCP Connection: ponylang/ponyc#1578, ponylang/ponyc#1626.
While this helped us fix the original issue, it also exposed a much less frequent segfault that seems to occur while running these tests.
@dipinhora investigated this
Conclusion
There appears to be a race condition in the pony runtime shutdown code handling the ASIO backend. This results in a segfault during runtime shutdown.
We want to come up with code to reproduce this that's independent of Wallaroo, and open an issue to fix it against mainline Ponyc.
The text was updated successfully, but these errors were encountered: