Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No reconnection with SSL #265

Closed
calvin-nl opened this issue Apr 12, 2024 · 3 comments · Fixed by #266
Closed

No reconnection with SSL #265

calvin-nl opened this issue Apr 12, 2024 · 3 comments · Fixed by #266
Labels

Comments

@calvin-nl
Copy link

We've been using Redix with AWS ElastiCache with ssl for qute awhile. However, we recently hit an issue where Redix would not reconnect after disconnecting. In particular, we started getting %Redix.ConnectionError{reason: :closed} on all commands. This continued to happened for around 12 hours while other applications were able to connect to ElastiCache just fine. The issue was resolved after restarting the application. There would have been plenty of opportunities for a reconnect to happen over the 12 hour span.

Digging into Redix a bit:
If it receives an error on the send, it'll close the socket and go into disconnected state: https://github.com/whatyouhide/redix/blob/main/lib/redix/connection.ex#L273-L277

Once in disconnected state, Redix will respond to all commands with :closed, which was what we were seeing: https://github.com/whatyouhide/redix/blob/main/lib/redix/connection.ex#L198-L200

The reconnect logic will happen when disconnect is called: https://github.com/whatyouhide/redix/blob/main/lib/redix/connection.ex#L316-L329

And disconnect is called when the socket_owner sends a stopped message to the connection while the connection is in disconnected state: https://github.com/whatyouhide/redix/blob/main/lib/redix/connection.ex#L206-L216

The socket_owner sends stopped when it receives one of ssl_closed/ssl_error:
https://github.com/whatyouhide/redix/blob/main/lib/redix/socket_owner.ex#L68-L74
https://github.com/whatyouhide/redix/blob/main/lib/redix/socket_owner.ex#L136-L139

So if the socket_owner never receives ssl_closed/ssl_error, it'll never attempt a reconnect.

We created a toy example to see if the client would get a :ssl_closed/:ssl_error message if the server closed the connection, and indeed, the client does not receive such a message. Running the same example with gen_tcp, the client will receive a :tcp_closed message.

Server

defmodule TlsExampleServer do
  use GenServer

  @ip {127, 0, 0, 1}
  @port 54343

  def start_link(_) do
    GenServer.start_link(__MODULE__, %{})
  end

  def init(state) do
    Logger.warning("Started server")
    {:ok, state, {:continue, :listen}}
  end

  def handle_continue(:listen, state) do
    :ssl.start()

    # Create an SSL server socket
    {:ok, listen_socket} = :ssl.listen(@port, [
      certs_keys: [%{
        :keyfile => "key.pem",
        :certfile => "certificate.pem",
        :password => "12345"
      }]
    ])

    # Accept incoming connections
    {:ok, accept_socket} = :ssl.transport_accept(listen_socket)
    {:ok, accept_socket} = :ssl.handshake(accept_socket)

    Logger.warning("Server sending message 1")
    res = :ssl.send(accept_socket, "Hello World")
    Logger.warning("Server send res #{inspect(res)}")

    Logger.warning("Server sleeping")
    Process.sleep(1000)

    Logger.warning("Closing server socket")
    :ssl.close(accept_socket)
    :ssl.close(listen_socket)
    {:stop, :normal, %{}}
  end
end

Client

defmodule TlsExampleClient do
  use GenServer

  @ip {127, 0, 0, 1}
  @port 54343

  def start_link(_) do
    GenServer.start_link(__MODULE__, %{})
  end

  def init(state) do
    Logger.warning("Started client")
    {:ok, state, {:continue, :connect_send}}
  end

  def handle_continue(:connect_send, state) do
    {:ok, connect_socket} = :ssl.connect(@ip, @port, [ verify: :verify_none, active: :once ], :infinity)

    Logger.warning("Client sleeping")
    Process.sleep(3000)
    Logger.warning("Client done sleeping")
    Process.send_after(self(), :close_socket, 1000)

    {:noreply, %{socket: connect_socket}}
  end

  def handle_info(:close_socket, state) do
    Logger.warning("Client closing socket")
    :ssl.close(state.socket)
    {:noreply, state}
  end

  def handle_info({:tcp_error, socket, reason}, state) do
    Logger.warning("Client got tcp error #{inspect(reason)} #{inspect(socket)}")
    {:stop, :normal, %{}}
  end

  def handle_info({:tcp_closed, socket}, _state) do
    Logger.warning("Client got tcp closed #{inspect(socket)}")
    {:stop, :normal, %{}}
  end

  def handle_info({:ssl_closed, socket}, _state) do
    Logger.warning("Client got ssl closed #{inspect(socket)}")
    {:stop, :normal, %{}}
  end

  def handle_info({:ssl_error, socket, reason}, _state) do
    Logger.warning("Client got ssl error #{inspect(reason)} #{inspect(socket)}")
    {:stop, :normal, %{}}
  end

  def handle_info({:ssl, socket, data}, state) do
    Logger.warning("Client got ssl message #{inspect(data)} from socket #{inspect(socket)}")
    {:noreply, state}
  end
end

The messages outputted for ssl:

[warning] Client sleeping {"details":{}}
[warning] Server sending message 1 {"details":{}}
[warning] Server send res :ok {"details":{}}
[warning] Server sleeping {"details":{}}
[warning] Closing server socket {"details":{}}
[warning] Client done sleeping {"details":{}}
[warning] Client got ssl message 'Hello World' from socket {:sslsocket, {:gen_tcp, #Port<0.48>, :tls_connection, :undefined},[#PID<0.1187.0>, #PID<0.1185.0>]} {"details":{}}
[warning] Client closing socket {"details":{}}

We would have expected the client to receive an :ssl_closed message.

The messages outputted when using gen_tcp:

[warning] Client sleeping {"details":{}}
[warning] Server sending message 1 {"details":{}}
[warning] Server send res :ok {"details":{}}
[warning] Server sleeping {"details":{}}
[warning] Closing server socket {"details":{}}
[warning] Client done sleeping {"details":{}}
[warning] Client got tcp message 'Hello World' from socket #Port<0.35> {"details":{}}
[warning] Client got tcp closed #Port<0.35> {"details":{}}

This is on:

elixir 1.14.5-otp-25
erlang 25.3.2.9
@whatyouhide whatyouhide changed the title No reconnection with ssl No reconnection with SSL Apr 12, 2024
@whatyouhide
Copy link
Owner

Oh fantastic catch. To reiterate what you just said:

  1. Connection goes to :disconnected state if sending errors out.
  2. However, connection waits for the socket owner to also go down in that case.
  3. Sometimes, owner doesn't go down (because it never receives *_closed messages).
  4. Connection stuck in :disconnected state with no reconnection attempts.

I'll work on a fix this weekend 🙃

@wchenNL
Copy link

wchenNL commented Apr 18, 2024

Appreciate the fix, are you planning to have a new release with this? @whatyouhide

@whatyouhide
Copy link
Owner

@wchenNL just released v1.4.2 💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants