Skip to content
This repository has been archived by the owner on Feb 21, 2023. It is now read-only.

ConnectionResetError: [Errno 104] Connection reset by peer #778

Open
Tracked by #1225
zzlpeter opened this issue Jul 16, 2020 · 58 comments · May be fixed by #1156
Open
Tracked by #1225

ConnectionResetError: [Errno 104] Connection reset by peer #778

zzlpeter opened this issue Jul 16, 2020 · 58 comments · May be fixed by #1156
Labels

Comments

@zzlpeter
Copy link

Hi, everybody!
I use torando+aioredis, and recently i met this issue, below is traceback
my environ: aioredis==1.2.0 tornado==5.1.1
I use this method aioredis.create_redis_pool(**args) to create pool
can anybody show me help? thx a lot.

`Traceback (most recent call last):
   File "/usr/local/lib/python3.6/site-packages/tornado/web.py", line 1699, in _execute
 result = await result
 File "/views/notice.py", line 341, in get
   items, page_size, total_page, total_size = await Notice.cache_or_api_list(notice_id_list, page_count, page_size)
 File "models/notice.py", line 136, in cache_or_api_list
   items = await cls.query_list(page_list)
 File "models/notice.py", line 92, in query_list
   items = await asyncio.gather(*[Notice.cache_or_api(notice_id) for notice_id in notice_id_list])
 File "models/notice.py", line 37, in cache_or_api
   info = await redis.execute('get', redis_key)
 File "models/notice.py", line 37, in cache_or_api
   info = await redis.execute('get', redis_key)
 File "models/notice.py", line 37, in cache_or_api
   info = await redis.execute('get', redis_key)
 [Previous line repeated 11 more times]
 File "/usr/local/lib/python3.6/site-packages/aioredis/connection.py", line 183, in _read_data
   obj = await self._reader.readobj()
 File "/usr/local/lib/python3.6/site-packages/aioredis/stream.py", line 94, in readobj
   await self._wait_for_data('readobj')
 File "/usr/local/lib/python3.6/asyncio/streams.py", line 464, in _wait_for_data
   yield from self._waiter
 File "/usr/local/lib/python3.6/asyncio/selector_events.py", line 723, in _read_ready
   data = self._sock.recv(self.max_size)
ConnectionResetError: [Errno 104] Connection reset by peer`
@CL545740896
Copy link

i had the same ConnectionResetError

@seandstewart
Copy link
Collaborator

Can you please check out the latest master and test with that? Note that as of #891 the client has the same API as redis-py.

@seandstewart seandstewart added the need investigation Need to look into described issue. label Mar 19, 2021
@djstein
Copy link

djstein commented Mar 31, 2021

getting a few thousand of these a day when using Django Channels
tested with Python 3.8.1 and Python 3.9.2
using aioredis 1.3.1 installed as child of channels

@Andrew-Chen-Wang
Copy link
Collaborator

@djstein Are you getting this on production? If you're experiencing this in development, please refer to #930 for migration needs as v1 major version will not get any fixes.

@rushilsrivastava
Copy link

This error still exists in aioredis==2.0.0 in production.

@RonaldinhoL
Copy link

i had the same ConnectionResetError

@RonaldinhoL
Copy link

from my test, when this occured, aioredis / redis-py / asyncio-redis cann't connect, but aredis can do, what is the difference in it?

@eneloop2
Copy link

eneloop2 commented Sep 9, 2021

Hi all, the problem lies with the ConnectionError type. aioredis implements its own ConnectionError(builtins.ConnectionError, RedisError which causes problems because ConnectionResetError is no longer a subclass of ConnectionError. If one looks at line 37 of client.py, it overwrites ConnectionError so that the exception can no longer be caught on line 1067....

@shaakaud
Copy link

shaakaud commented Nov 4, 2021

Any workaround found for this issue ?

@Andrew-Chen-Wang Andrew-Chen-Wang pinned this issue Nov 20, 2021
@Andrew-Chen-Wang Andrew-Chen-Wang unpinned this issue Nov 20, 2021
@Enchufa2
Copy link

Enchufa2 commented Dec 2, 2021

Same issue here in production. There's a firewall resetting connections after some time, and aioredis won't recover from this, which is a serious problem. Is there any known workaround?

@Enchufa2
Copy link

Enchufa2 commented Dec 2, 2021

Please, consider this issue in #1225.

@rushilsrivastava
Copy link

This issue can be closed, it should've been solved with #1129.

@Enchufa2
Copy link

Enchufa2 commented Dec 2, 2021

No, sorry, it doesn't solve this issue. Quick test:

  1. Run a Redis instance locally:
$ docker run --rm -d -p 6379:6379 --name redis redis
  1. Run a simple test.py program that continually reads something from Redis. For example (ugly, but easy to translate to redis):
from aioredis import Redis
import asyncio, time

redis = Redis()
loop = asyncio.get_event_loop()
loop.run_until_complete(redis.set("something", "something"))

while True:
  print(loop.run_until_complete(redis.get("something")))
  time.sleep(5)
  1. Kill the connection (technically this is a connection abort, not a connection reset, but it has the same effect and unveils the same underlying issue):
$ sudo ss -K dst [::1] dport = redis
  • The equivalent test program using redis instead of aioredis automagically recovers from the loss of connection and keeps going without error.

  • aioredis 2.0.0 gives:

Traceback (most recent call last):
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 815, in send_packed_command
    await asyncio.wait_for(
  File "/usr/lib64/python3.9/asyncio/tasks.py", line 442, in wait_for
    return await fut
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 797, in _send_packed_command
    await self._writer.drain()
  File "/usr/lib64/python3.9/asyncio/streams.py", line 387, in drain
    await self._protocol._drain_helper()
  File "/usr/lib64/python3.9/asyncio/streams.py", line 190, in _drain_helper
    raise ConnectionResetError('Connection lost')
ConnectionResetError: Connection lost

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "***/test.py", line 9, in <module>
    print(loop.run_until_complete(redis.get("something")))
  File "/usr/lib64/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/client.py", line 1063, in execute_command
    await conn.send_command(*args)
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 840, in send_command
    await self.send_packed_command(
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 829, in send_packed_command
    raise ConnectionError(
aioredis.exceptions.ConnectionError: Error UNKNOWN while writing to socket. Connection lost.
Traceback (most recent call last):
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 762, in disconnect
    await self._writer.wait_closed()
  File "/usr/lib64/python3.9/asyncio/streams.py", line 359, in wait_closed
    await self._protocol._get_close_waiter(self)
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 815, in send_packed_command
    await asyncio.wait_for(
  File "/usr/lib64/python3.9/asyncio/tasks.py", line 442, in wait_for
    return await fut
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 797, in _send_packed_command
    await self._writer.drain()
  File "/usr/lib64/python3.9/asyncio/streams.py", line 375, in drain
    raise exc
  File "/usr/lib64/python3.9/asyncio/selector_events.py", line 856, in _read_ready__data_received
    data = self._sock.recv(self.max_size)
ConnectionAbortedError: [Errno 103] Software caused connection abort

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "***/test.py", line 9, in <module>
    print(loop.run_until_complete(redis.get("something")))
  File "/usr/lib64/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/client.py", line 1063, in execute_command
    await conn.send_command(*args)
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 840, in send_command
    await self.send_packed_command(
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 829, in send_packed_command
    raise ConnectionError(
aioredis.exceptions.ConnectionError: Error 103 while writing to socket. Software caused connection abort.

So we switched error messages, but the issue persists.

@Enchufa2
Copy link

Enchufa2 commented Dec 2, 2021

A workaround would be to call redis.connection_pool.disconnect() before performing any operation before/after a long pause where a reset may happen.

@Andrew-Chen-Wang
Copy link
Collaborator

Andrew-Chen-Wang commented Dec 2, 2021

@Enchufa2 thanks for running the test. What do you mean automagically? Is there code that they implement that we somehow missed or code that we did port that is not functioning properly? Does internal connection not handle this properly?

@Enchufa2
Copy link

Enchufa2 commented Dec 2, 2021

I mean that redis somehow figures out that the connection is broken, disposes of it and opens a new one instead of failing. I'm not sure how redis does this and therefore what's missing here, because unfortunately I'm not familiar with either codebases. But redis's behaviour is what I would expect from a connection pool. Otherwise, one needs to think about whether there are connections and whether they are alive, which is contrary to the very abstraction of a connection pool, right?

@Andrew-Chen-Wang
Copy link
Collaborator

Andrew-Chen-Wang commented Dec 2, 2021

@Enchufa2 can you change this:

https://github.com/aio-libs/aioredis-py/blob/dbdd0add63f986f2ed2d56c9736303d133add23c/aioredis/connection.py#L850

to if not self.is_connected:

Redis is checking using self._sock, but we don't use self._sock. This could be the underlying reason, though I'm not sure how we didn't catch this early on or if a PR changed this somehow.

@Enchufa2
Copy link

Enchufa2 commented Dec 2, 2021

Nope, this doesn't help, same error. By the way, I noticed that I switched the errors coming from the current release and the current master branch in my previous comment. Apologies, the comment is amended now.

@Enchufa2
Copy link

Enchufa2 commented Dec 2, 2021

This is interesting. It turns out I had redis v3.5.3 installed, and this is the version that recovers from connection aborts and resets. I just updated to v4.0.2 and it shows this same issue.

@Enchufa2
Copy link

Enchufa2 commented Dec 2, 2021

Reported in redis/redis-py#1772

@Andrew-Chen-Wang
Copy link
Collaborator

@Enchufa2 I'm unable to reproduce this error in both redis main/master branch and aioredis==2.0.0. https://github.com/Andrew-Chen-Wang/aioredis-issue-778 unlike ss, I'm using CLIENT KILL and CLIENT KILL ADDR with no success. When doing CLIENT LIST, in both cases, a new connection is established, and this is shown via an incremented ID. I was using gitpod since I don't have a tool similar to ss on Mac for killing sockets. There may be a chance CLIENT KILL gave aioredis a warning beforehand, but if a Connection reset by peer is occurring, then I can't imagine functionality similar to CLIENT KILL not being performed.

@Enchufa2
Copy link

Enchufa2 commented Dec 2, 2021

If I use CLIENT KILL, I see this with the current master branch:

Traceback (most recent call last):
  File "***/test.py", line 9, in <module>
    print(loop.run_until_complete(redis.get("something")))
  File "/usr/lib64/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/client.py", line 1064, in execute_command
    return await self.parse_response(conn, command_name, **options)
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/client.py", line 1080, in parse_response
    response = await connection.read_response()
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 854, in read_response
    response = await self._parser.read_response()
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 367, in read_response
    raw = await self._buffer.readline()
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 301, in readline
    await self._read_from_socket()
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 250, in _read_from_socket
    raise ConnectionError(SERVER_CLOSED_CONNECTION_ERROR)
aioredis.exceptions.ConnectionError: Connection closed by server.

@cjdsellers
Copy link

We were also seeing this in production with aioredis==2.0.0. Our solution was to remove the connection pool and rely on single connections (although this isn't ideal).

@Andrew-Chen-Wang I see you were unable to reproduce, is there any current plan to look into this further? Or would you like someone to put together a PR with a potential fix?

@Andrew-Chen-Wang
Copy link
Collaborator

@cjdsellers I still can't reproduce this issue, so a PR is much appreciated as we're down a maintainer now and I'm stuck with work.

@Enchufa2
Copy link

gotcha, to be clear it's Connection's retry_on_error that resolves this right?

No, it's call_with_retry, from the Retry class. The issue is two-fold:

  1. When aioredis tries to use a connection, it retries only if there's a timeout. Otherwise, it simply fails. redis implements call_with_retry, which always retries, whatever happens, and then raises a disconnect if everything goes wrong. Then, the disconnect method tests the socket ignoring OSError. So it doesn't fail on resets and aborts, and reconnection can happen.
  2. When aioredis or redis ask for a new connection from the pool, if the connection is down due to a reset or an abort, OSError is raised. This is what 1832 fixes.

@Andrew-Chen-Wang Andrew-Chen-Wang added bug and removed need investigation Need to look into described issue. labels Jan 12, 2022
@Andrew-Chen-Wang
Copy link
Collaborator

Andrew-Chen-Wang commented Jan 12, 2022

@Enchufa2 ok I didn't think it was that as we implemented all call_with_retry, but, as it turns out, in the port, we forgot to await the fail/disconnect call in call_with_retry... It's now fixed in #1156. May you please test it one more time for me. Thank you so much! (I've also implemented redis/redis-py#1832 to #1156)

@Enchufa2
Copy link

We're still in the same spot with those changes:

Traceback (most recent call last):
  File "/home/***/.local/lib/python3.10/site-packages/aioredis/connection.py", line 802, in disconnect
    await self._writer.wait_closed()  # type: ignore[union-attr]
  File "/usr/lib64/python3.10/asyncio/streams.py", line 344, in wait_closed
    await self._protocol._get_close_waiter(self)
  File "/home/***/.local/lib/python3.10/site-packages/aioredis/connection.py", line 853, in send_packed_command
    await asyncio.wait_for(
  File "/usr/lib64/python3.10/asyncio/tasks.py", line 408, in wait_for
    return await fut
  File "/home/***/.local/lib/python3.10/site-packages/aioredis/connection.py", line 835, in _send_packed_command
    await self._writer.drain()
  File "/usr/lib64/python3.10/asyncio/streams.py", line 360, in drain
    raise exc
  File "/usr/lib64/python3.10/asyncio/selector_events.py", line 856, in _read_ready__data_received
    data = self._sock.recv(self.max_size)
ConnectionAbortedError: [Errno 103] Software caused connection abort

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/***/redis-issue/test-aioredis.py", line 9, in <module>
    print(loop.run_until_complete(redis.get("something")))
  File "/usr/lib64/python3.10/asyncio/base_events.py", line 641, in run_until_complete
    return future.result()
  File "/home/***/.local/lib/python3.10/site-packages/aioredis/client.py", line 1173, in execute_command
    return await conn.retry.call_with_retry(
  File "/home/***/.local/lib/python3.10/site-packages/aioredis/retry.py", line 55, in call_with_retry
    await fail(error)
  File "/home/***/.local/lib/python3.10/site-packages/aioredis/client.py", line 1162, in _disconnect_raise
    raise error
  File "/home/***/.local/lib/python3.10/site-packages/aioredis/retry.py", line 52, in call_with_retry
    return await do()
  File "/home/***/.local/lib/python3.10/site-packages/aioredis/client.py", line 1151, in _send_command_parse_response
    await conn.send_command(*args)
  File "/home/***/.local/lib/python3.10/site-packages/aioredis/connection.py", line 878, in send_command
    await self.send_packed_command(
  File "/home/***/.local/lib/python3.10/site-packages/aioredis/connection.py", line 867, in send_packed_command
    raise ConnectionError(
aioredis.exceptions.ConnectionError: Error 103 while writing to socket. Software caused connection abort.

Now I'm a bit lost. How does a disconnect involve a send_packed_command? Or am I reading the trace incorrectly?

@Andrew-Chen-Wang
Copy link
Collaborator

Take a look at what Retry is set to in your connection object: https://github.com/aio-libs/aioredis-py/blob/cb93ef4a6fa80f3d32d78aced58d8df196aa58e1/aioredis/connection.py#L629

In the stack trace, it did fail. My guess is you had retry_on_timeout implemented so you got 1 retry. Retry performed the send command function again. I guess what I wouldn't understand is whythe exception still wasn't caught. Maybe the Rrtry class also needs to catch OSError? Probably using a debugger would be helpful to catch whether the disconnected connection was being reused?

@Enchufa2
Copy link

Enchufa2 commented Jan 13, 2022

Were you able to reproduce the error? If not, I think we should focus on that first. :) You need to find a way to reset or abort a TCP connection in macOS, or otherwise, test this into a Linux box, where you can use ss, which aborts the connection. AFAIK, there's no equivalent in macOS. But there's the tcpkill utility, from dsniff, which seems to work on macOS too (at least this fork).

I just tried tcpkill myself on Linux and works nicely by resetting the connection (exactly the same issue I've found in production). There are two things you need to do for this to work. First, launch tcpkill:

$ sudo tcpkill -i lo port 6379

Here tcpkill keeps listening the loopback interface and will reset any TCP connection to port 6379 (if your Redis host is not local, then just change the interface accordingly). And secondly, you need to force an IPv4 connection (because tcpkill doesn't understand IPv6). Again for localhost, you just need to specify redis = Redis(host="127.0.0.1") in my test case above.

@cjdsellers
Copy link

Seems like you guys are really close to a solution on this? Thank you for all the effort here!

@steve-marmalade
Copy link

Now that Aioredis is in redis-py 4.2.0rc1, is the expectation that this issue will go away once we migrate to redis? Any idea when a non-rc version will be released? Thanks for all your hard work!

@Olegt0rr
Copy link
Contributor

Olegt0rr commented Mar 9, 2022

Still actual

File "aioredis/client.py", line 1082, in execute_command
    conn = self.connection or await pool.get_connection(command_name, **options)
  File "aioredis/connection.py", line 1422, in get_connection
    if await connection.can_read():
  File "aioredis/connection.py", line 893, in can_read
    return await self._parser.can_read(timeout)
  File "asyncio/streams.py", line 665, in read
    raise self._exception
ConnectionResetError: [Errno 104] Connection reset by peer

@adrienyhuel
Copy link

@Enchufa2 ok I didn't think it was that as we implemented all call_with_retry, but, as it turns out, in the port, we forgot to await the fail/disconnect call in call_with_retry... It's now fixed in #1156. May you please test it one more time for me. Thank you so much! (I've also implemented redis/redis-py#1832 to #1156)

@Andrew-Chen-Wang
I see that thoses fixes are in pull requests in aioredis, but not in redis-py asyncio.
Will this be ported too ?
Actually I'm stuck with aioredis 1.3.1 due to ConnectionResetError :(

@Andrew-Chen-Wang
Copy link
Collaborator

i've lost a severe amt of time to work so not sure when it'll be ported over. Maybe sometime in June unless someone else would like to help out. Anyone can help!

@steve-marmalade
Copy link

Hi @Andrew-Chen-Wang could you point me in the direction of the changes that need to be ported over? We have currently disabled connection pooling due to the stability issues, and it definitely has introduced a performance overhead.

@Andrew-Chen-Wang
Copy link
Collaborator

Andrew-Chen-Wang commented May 25, 2022

hi @steve-marmalade please head over to redis/redis-py to ask this question. I'm fairly busy lately, but I'll try to answer when I find the time. Sorry!

edit sorry! I shoulda looked at this more carefully

@Andrew-Chen-Wang
Copy link
Collaborator

@chayim do you think you can port over the PR that fixed this issue? Thanks!

@steve-marmalade
Copy link

Gentle bump @Andrew-Chen-Wang , @chayim

@Olegt0rr
Copy link
Contributor

Gentle bump @Andrew-Chen-Wang , @chayim

Just use main redis library. This issue is not occurs there.

@chayim
Copy link

chayim commented Jun 19, 2022

@Andrew-Chen-Wang what do you think of our (me) finding ways to move bugs between repos... let's say specific items?

@steve-marmalade
Copy link

steve-marmalade commented Jun 22, 2022

@Olegt0rr , can you expand on:

Just use main redis library. This issue is not occurs there.

We still see intermittent ConnectionErrors when using redis version 4.3.1 via from redis import asyncio as aioredis. I see another user posted:

I see that thoses fixes are in pull requests in aioredis, but not in redis-py asyncio.
Will this be ported too ?

So I thought I was not alone in feeling that this issue hasn't been resolved in the other redis library.

@adrienyhuel
Copy link

@steve-marmalade

I use redis-py async commands, and I had to write a simple wrapper to "retry" connection resets :

import redis.asyncio as aioredis
from logging import getLogger

logger = getLogger('quart.app')


class AIORedisWrapper:

    REDIS_RETRY_LOG = 'Redis connection error, retrying...'

    def __init__(self, address):
        self.pool = aioredis.ConnectionPool.from_url(address, decode_responses=True)
        self.backend = aioredis.Redis(connection_pool=self.pool)

    async def get(self, key: str):
        try:
            return await self.backend.get(key)
        except (OSError, aioredis.ConnectionError):
            logger.warning(self.REDIS_RETRY_LOG)
            await self.pool.disconnect()
            return await self.backend.get(key)

    async def setex(self, name: str, value, time: int):
        try:
            return await self.backend.setex(name=name, value=value, time=time)
        except (OSError, aioredis.ConnectionError):
            logger.warning(self.REDIS_RETRY_LOG)
            await self.pool.disconnect()
            return await self.backend.setex(name=name, value=value, time=time)

    async def delete(self, key: str):
        try:
            return await self.backend.delete(key)
        except (OSError, aioredis.ConnectionError):
            logger.warning(self.REDIS_RETRY_LOG)
            await self.pool.disconnect()
            return await self.backend.delete(key)

@Andrew-Chen-Wang
Copy link
Collaborator

@chayim I don't think we can transfer issues between orgs, and even if that's possible, I don't have the perms to do so.

What we could try to do is create new issues in redis-py that carries on conversation from aioredis. We'd lock those aioredis issues and leave a comment saying to go to the redis-py repo. We can also add two labels: aioredis-native and redis-py. It's a bit manual though, but I guess that's the prob with migrations. Thoughts?

@chayim
Copy link

chayim commented Jun 26, 2022

@Andrew-Chen-Wang here are my thoughts, if you're game:

  1. We identify which issues we'd like to transer (by #)
  2. We "archive" aioredis (you already have the pointer to redis-py)
  3. I script added each issue from aioredis, including all comments to redis-py. I'll try and add the users if possible. At the very least, I'd add a comment "originally ###" to ensure the cross-link happens.

WDYT?

@Andrew-Chen-Wang
Copy link
Collaborator

@chayim sounds great except I promised, when I had the time at least, that I'd revert aioredis to its original v1 form. Archiving would defeat the purpose of seeing if anyone preferred the old pkg.

I think points 1 and 3 are good enough. I don't see much activity on aioredis anymore anyways since most people see the migration note. I'll also add the migration note to the docs since people might be navigating there as well.

Beside pt 2, it all sounds good with me!

@jonathansp
Copy link

any workaround so far?

_wait_response
    inference_response = await async_response
redis.exceptions.ConnectionError: Error UNKNOWN while writing to socket. Connection lost.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.