Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib3 2.0: Double reading causes IncompleteRead error #295

Closed
frostming opened this issue May 7, 2023 · 0 comments · Fixed by #296
Closed

urllib3 2.0: Double reading causes IncompleteRead error #295

frostming opened this issue May 7, 2023 · 0 comments · Fixed by #296
Assignees

Comments

@frostming
Copy link
Contributor

frostming commented May 7, 2023

In the Serializer.dumps() method, the underlying fp is replaced with a new, never read stream, which causes the stream to be double read: https://github.com/ionrock/cachecontrol/blob/c05ef9eff1c9ac176481fb99e7a7188aa5b4e17b/cachecontrol/serialize.py#L35-L36

This will lead to an IncompleteRead error on urllib3 2.0 since it starts to strictly check the HTTPResponse.length_remaining attribute which will be a negative value(content_length - 2*content_length = -content_length).

I have no idea why this replacement is necessary.

This can be reproduced with the following minimal reproducible example:

from cachecontrol import CacheControl
from cachecontrol.serialize import Serializer as LegacySerializer
from requests import Session


class Serializer(LegacySerializer):
    def dumps(self, request, response, body=None):
        if not hasattr(response, "strict"):
            # XXX: urllib3 2.0 removes this attribute
            response.strict = False
        return super().dumps(request, response, body)

    def prepare_response(self, request, cached, body_file=None):
        # We don't need to pass strict to HTTPResponse
        cached["response"].pop("strict", None)
        return super().prepare_response(request, cached, body_file)


s = CacheControl(Session(), serializer=Serializer())
url = "https://pypi.org/simple/pdm-backend/"
headers = {"Cache-Control": "max-age=0"}
s.get(url, headers=headers).content
resp = s.get(url, headers=headers)
assert resp.from_cache
resp.content

The error:

Traceback (most recent call last):
  File "/Users/fming/wkspace/github/pdm/venv/lib/python3.10/site-packages/urllib3/response.py", line 705, in _error_catcher
    yield
  File "/Users/fming/wkspace/github/pdm/venv/lib/python3.10/site-packages/urllib3/response.py", line 830, in _raw_read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
urllib3.exceptions.IncompleteRead: IncompleteRead(4658 bytes read, -2329 more expected)

...(more tracebacks omitted)

Note that the read size 4658 is twice the content length 2329

Related to #264

frostming added a commit to frostming/cacheyou that referenced this issue May 8, 2023
frostming added a commit to frostming/cacheyou that referenced this issue May 8, 2023
frostming added a commit to frostming/cacheyou that referenced this issue May 18, 2023
frostming added a commit to frostming/cacheyou that referenced this issue May 18, 2023
woodruffw pushed a commit to woodruffw-forks/cachecontrol that referenced this issue May 22, 2023
@frostming frostming self-assigned this May 25, 2023
frostming added a commit to frostming/cacheyou that referenced this issue May 25, 2023
frostming added a commit that referenced this issue May 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant