-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow client to reconnect if other side closes connection #478
base: master
Are you sure you want to change the base?
Conversation
Ah, sorry - I've forgotten to resend data after reconnection - some additional fixes needed |
Now all works fine in my environment. What should I do with test coverage? ) Where are no any tests related to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now all works fine in my environment. What should I do with test coverage? ) Where are no any tests related to
Elixir.Connection
behaviour inclient_test.exs
.. Should I implement some additional test cases?
We have some more integration style tests in server_test.exs
. It would be good to get at test for new behavior.
However would using :inet.setopts(sock, active: :once)
in the client process allow us to handle idle disconnect earlier and not have to retry a request? We should usually get a close notification message, and can reconnect proactively? Then instead of :gen_tcp.recv
we could do receive do {:tcp, ^socket, data} ->
etc.
Also its not always the case that all connection closes can be retried, right now this could double the request rate to a badly behaving service :(.
Agree. The better solution is to handle disconnects at the moment of disconnect. I think that we can change active option to
Why it doubles request rate? If connection is closed at the moment of sending then where are no data sent at all and it will sent once after reconnect |
2ce786a
to
292776b
Compare
292776b
to
a0be2d9
Compare
@fishcakez I'm ready for your re-review ) |
I like what you did with
Unfortunately with TCP we can not tell why the socket closed. There are three main scenarios:
I wonder if handling the disconnect immediately (active: true with message handling) is enough for the issue you're seeing as we would catch scenario 1) most of the time but never retry on 2) or 3). Perhaps we can split this PR in two, and follow up on reconnect retry in a second one if its still a problem. |
I'm writing this PR not only for my case but for any possible similar cases. But I agree that in case of 2) and 3) we should not resend data |
@fishcakez I've kept |
Hi, guys! I need a SASL authentication for hbase. I can prepare another PR with auth backend support, but I need this PR to be included in my code too (In my |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @cybernetlab, I messed up my github notifications temporarily so missed the your last change. One comment to handle and then we can merge this change.
{:disconnect, :reconnect, s} | ||
end | ||
|
||
def handle_info(_, s) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should handle the {:tcp_error, sock, _}
, {:ssl_error, sock, _}
, {:tcp, sock, _}, {;ssl, sock, _}
here but ignore messages from old sockets (or clean them up in disconnect/2).
@fishcakez ready to review ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! My only request is that we document the new :reconnect
behavior somewhere (in a @moduledoc
and/or in the README.md
).
@jparise is it enough to add:
as a last paragraph to the end of |
Yes, that would be fine. |
end | ||
|
||
def handle_info({:ssl_error, sock, _error}, %{reconnect: true, sock: {:ssl, sock}} = s) do | ||
{:disconnect, :reconnect, s} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should always disconnect and make the decision on reconnect there. If reconnect: false
we want to disconnect too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case many tests fails. If I disconnect on any TCP or SSL message with reconnect: false
following output produced by mix test
:
19:52 $ mix test
Compiling 1 file (.ex)
Excluding tags: [pending: true]
............................................................................................................................................
1) test clients exit if they try to use a closed client (Thrift.Generator.ServiceTest)
test/thrift/generator/service_test.exs:387
** (exit) exited in: :gen_server.call(#PID<0.1755.0>, {:call, "friend_ids_of", [<<10, 0, 1, 0, 0, 0, 0, 0, 0, 4, 210>> | <<0>>], []}, 5000)
** (EXIT) {:error, :closed}
code: assert Client.friend_ids_of(client, 1234) == {:error, :closed}
stacktrace:
(stdlib) gen_server.erl:223: :gen_server.call/3
(thrift) lib/thrift/binary/framed/client.ex:221: Thrift.Binary.Framed.Client.call/5
test/thrift/generator/service_test.exs:392: (test)
The following output was logged:
19:52:36.114 [error] Connection closed
19:52:36.115 [error] GenServer #PID<0.1755.0> terminating
** (stop) {:error, :closed}
Last message: {:tcp_closed, #Port<0.19>}
State: nil
..............
2) test clients exit on connection timeout (Thrift.Generator.ServiceTest)
test/thrift/generator/service_test.exs:373
** (exit) exited in: :gen_server.call(#PID<0.1958.0>, {:call, "friend_ids_of", [<<10, 0, 1, 0, 0, 0, 0, 0, 0, 4, 210>> | <<0>>], []}, 5000)
** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
code: assert {:error, closed} = Client.friend_ids_of(client, 1234)
stacktrace:
(stdlib) gen_server.erl:223: :gen_server.call/3
(thrift) lib/thrift/binary/framed/client.ex:221: Thrift.Binary.Framed.Client.call/5
test/thrift/generator/service_test.exs:382: (test)
The following output was logged:
19:52:36.155 [error] Connection closed
19:52:36.155 [error] GenServer #PID<0.1958.0> terminating
** (stop) {:error, :closed}
Last message: {:tcp_closed, #Port<0.59>}
State: nil
..
3) test it handles void oneway functions (Thrift.Generator.ServiceTest)
test/thrift/generator/service_test.exs:309
** (EXIT from #PID<0.1990.0>) {:error, :closed}
The following output was logged:
19:52:36.188 [error] Connection closed
19:52:36.188 [error] GenServer #PID<0.1996.0> terminating
** (stop) {:error, :closed}
Last message: {:tcp, #Port<0.68>, <<0>>}
State: nil
19:52:36.189 [error] GenServer #PID<0.1991.0> terminating
** (stop) {:error, :closed}
Last message: {:EXIT, #PID<0.1990.0>, {:error, :closed}}
State: {:state, {#PID<0.1991.0>, Supervisor.Default}, :one_for_one, {[ranch_listener_sup: Thrift.Generator.ServiceTest.ServerSpy], %{{:ranch_listener_sup, Thrift.Generator.ServiceTest.ServerSpy} => {:child, #PID<0.1992.0>, {:ranch_listener_sup, Thrift.Generator.ServiceTest.ServerSpy}, {:ranch_listener_sup, :start_link, [Thrift.Generator.ServiceTest.ServerSpy, :ranch_tcp, %{num_acceptors: 1, socket_opts: [port: 0]}, Thrift.Binary.Framed.ProtocolHandler, {Thrift.Generator.ServiceTest.SimpleService.Binary.Framed.Server, Thrift.Generator.ServiceTest.ServerSpy, [], []}]}, :permanent, :infinity, :supervisor, [:ranch_listener_sup]}}}, :undefined, 0, 5, [], 0, Supervisor.Default, {:ok, {%{intensity: 0, period: 5, strategy: :one_for_one}, [{{:ranch_listener_sup, Thrift.Generator.ServiceTest.ServerSpy}, {:ranch_listener_sup, :start_link, [Thrift.Generator.ServiceTest.ServerSpy, :ranch_tcp, %{num_acceptors: 1, socket_opts: [port: 0]}, Thrift.Binary.Framed.ProtocolHandler, {Thrift.Generator.ServiceTest.SimpleService.Binary.Framed.Server, Thrift.Generator.ServiceTest.ServerSpy, [], []}]}, :permanent, :infinity, :supervisor, [:ranch_listener_sup]}]}}}
.................
4) test it can handle oneway messages (Servers.Binary.Framed.IntegrationTest)
test/thrift/binary/framed/server_test.exs:158
** (EXIT from #PID<0.2192.0>) {:error, :closed}
The following output was logged:
19:52:36.324 [error] Connection closed
19:52:36.324 [error] GenServer :"test it can handle oneway messages" terminating
** (stop) {:error, :closed}
Last message: {:tcp, #Port<0.114>, <<0>>}
State: nil
..................................................
Finished in 6.3 seconds
227 tests, 4 failures
Randomized with seed 350387
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure - should I correct these test cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, any comments about my last question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to think about we handle these cases, and handle them.
For example we could do
try do
GenServer.call(...)
catch
:exit, {{:error, _} = error, _} ->
error
:exit, {:noproc, _} ->
{:error, :closed}
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My opinion is not to handle these cases and let users of library to handle it. But in this case some tests should be rewritten
It would be nice to allow framed client to reconnect if other side closed connection because of client inactivity for example. In my personal case thrift library used as a client for
hbase
and this server closes inactive connections in about 10 minutes. Also I'm usingpoolboy
to pool connections and without reconnection ability I'll need to do some unnecessary logic to monitor status of tcp connections and push out clients with closed connection from the pool.