-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reader go routine hangs and leaks when Connection.Close() is called multiple times #69
Milestone
Comments
fho
added a commit
to fho/amqp091-go
that referenced
this issue
Apr 13, 2022
Add a testcase for the bug that the reader go-routine tries to send a message to the buffered rpc channel but call() terminated because it read an error from the errors chan or the errors chan was closed. It cause that reader routine gets stuck forever and does not terminate when the connection is closed. More information: rabbitmq#69. This testcase does not reproduce the issue reliably, but it is triggered in ~80% of executions.
fho
added a commit
to fho/amqp091-go
that referenced
this issue
Apr 13, 2022
Add a testcase for the bug that the reader go-routine tries to send a message to the buffered rpc channel but call() terminated because it read an error from the errors chan or the errors chan was closed. It cause that reader routine gets stuck forever and does not terminate when the connection is closed. More information: rabbitmq#69. This testcase does not reproduce the issue reliably, but it is triggered in ~80% of executions.
lukebakken
added a commit
to fho/amqp091-go
that referenced
this issue
Apr 14, 2022
…stcase' into reader_routine_leak
lukebakken
pushed a commit
to fho/amqp091-go
that referenced
this issue
Apr 14, 2022
When a message was sent and it's response was received while the connection was closed or an error happened, the reader go-routine could get stuck and be leaked. The reader go routine tries to send a received message to the unbuffered c.rpc channel via the dispatch0() and dispatchN() methods. The call() method reads from the rpc channel. If an error happened while the dispatch method sends a message to the rpc channel, the call() method could terminate because it read an error from c.errors or because c.errors was closed. To prevent the scenario: - the reader go-routine now closes c.rpc when it terminates, - The call() method, reads from c.rpc until a message was received or it is closed. When c.rpc is closed, it reads an error from c.errors or wait until c.errors is closed. When it reads an error, it returns it. If it is closed it returns ErrClosed. This ensures that the messages is read from c.rpc before call() returns. It also ensures that when a message was received that it is processed. Previously it could happen that the message was silently ignored because c.errors returned an error or was closed. tests: add testcase to ensure reader routine terminates Add a testcase for the bug that the reader go-routine tries to send a message to the buffered rpc channel but call() terminated because it read an error from the errors chan or the errors chan was closed. It cause that reader routine gets stuck forever and does not terminate when the connection is closed. More information: rabbitmq#69. This testcase does not reproduce the issue reliably, but it is triggered in ~80% of executions. Bump GH actions versions Add step in GH actions to download goleak dependency
lukebakken
added a commit
to fho/amqp091-go
that referenced
this issue
Apr 19, 2022
lukebakken
pushed a commit
to fho/amqp091-go
that referenced
this issue
Apr 19, 2022
When a message was sent and it's response was received while the connection was closed or an error happened, the reader go-routine could get stuck and be leaked. The reader go routine tries to send a received message to the unbuffered c.rpc channel via the dispatch0() and dispatchN() methods. The call() method reads from the rpc channel. If an error happened while the dispatch method sends a message to the rpc channel, the call() method could terminate because it read an error from c.errors or because c.errors was closed. To prevent the scenario: - the reader go-routine now closes c.rpc when it terminates, - The call() method, reads from c.rpc until a message was received or it is closed. When c.rpc is closed, it reads an error from c.errors or wait until c.errors is closed. When it reads an error, it returns it. If it is closed it returns ErrClosed. This ensures that the messages is read from c.rpc before call() returns. It also ensures that when a message was received that it is processed. Previously it could happen that the message was silently ignored because c.errors returned an error or was closed. tests: add testcase to ensure reader routine terminates Add a testcase for the bug that the reader go-routine tries to send a message to the buffered rpc channel but call() terminated because it read an error from the errors chan or the errors chan was closed. It cause that reader routine gets stuck forever and does not terminate when the connection is closed. More information: rabbitmq#69. This testcase does not reproduce the issue reliably, but it is triggered in ~80% of executions. Bump GH actions versions add missing go.sum file tests/TestRequiredServerLocale: close connection on testcase termination The TestRequiredServerLocale testcase was not closing the connection that it opened. This caused the goleak detector in the TestReaderGoRoutineTerminatesWhenMsgIsProcessedDuringClose testcase to complain about a leaked heartbeat go-routine. tests: remove duplicate goleak invocation goleak is now called in TestMain(). The invocation in the TestReaderGoRoutineTerminatesWhenMsgIsProcessedDuringClose testcase can be removed. tests/ExampleConnection_reconnect: close connection on termination ExampleConnection_reconnect was not closing the opened connection on termination. This caused the goleak checker to complain about a leaked heartbeat go routine. Close the connection.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello,
we are using go.uber.org/goleak for our internal amqp wrapper package and stumbled over a go-routine leak.
This happens in an internal testcase where we send 2 messages to a non-existing exchange. This causes an error that is received by our reconnect goroutine on the NotifyClose channel which to close the connection and then the connection is also closed in our client (after the reconnect go-routine terminated).
In our testcase we only call 1x close on the channel (unnecessary) and 1x Close on the connection.
I managed to reproduce the same hang and go-routine leak when calling Close in parallel on the same connection. This is not the same scenario that happens in our internal testcase but it reproduces it. :-)
I'm using amqp091-go commit 6cac2fa.
This issue can be reproduced with the following testcase:
Test output:
On my machine it happens on almost every execution, some succeed without the leak though.
It might be necessary to run the testcase multiple times to run into the issue:
Update:
I think I now understand how it happens:
c.demux()
and passed the msg to the rpc chan2call()
method did not read the msg from the rpc channel yet, it is waiting in the select loop4, because the errors channel is closed the call() returns without reading the msg from the rpc channel, the reader go-routine hangs forever indispatch0()
trying to send the msg to the rpc channel 2 because it is unbufferedI guess this scenario, that call() returns before reading a msg from the rpc chan could also get triggered when Close() is called only 1x but an error happened, shortly after a message response was received.
call()
could return because an error is read fromc.errors
whiledispatch0
is sending a message to the rpc chan.Footnotes
https://github.com/rabbitmq/amqp091-go/blob/6cac2faf74b0e761395b4da4ebfa3fe4a8eb8b59/connection.go#L350 ↩
https://github.com/rabbitmq/amqp091-go/blob/6cac2faf74b0e761395b4da4ebfa3fe4a8eb8b59/connection.go#L483 ↩ ↩2
https://github.com/rabbitmq/amqp091-go/blob/6cac2faf74b0e761395b4da4ebfa3fe4a8eb8b59/connection.go#L425 ↩
https://github.com/rabbitmq/amqp091-go/blob/6cac2faf74b0e761395b4da4ebfa3fe4a8eb8b59/connection.go#L692 ↩
The text was updated successfully, but these errors were encountered: