Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation not mentioning treatment of invalid Content-Type headers in ClientResponse (mismatch of header content and property value) #9522

Open
1 task done
e-d-n-a opened this issue Oct 22, 2024 · 0 comments
Labels
bug good first issue Good for newcomers Hacktoberfest We think it's good for https://hacktoberfest.digitalocean.com/

Comments

@e-d-n-a
Copy link

e-d-n-a commented Oct 22, 2024

Describe the bug

Recently a client script of mine received responses (status=200) for image-fetches denoting 'text/plain' in the content_type property of the response, while the body was still an image and not an error message (as would be my first guess).

Only after a while, I checked the actual header to see, that the server sends an invalid value of just "jpg", instead of "image/jpg".

So it's confusing here, that ClientResponse.content_type gives a different result than ClientResponse.headers['Content-Type'].
In the documentation, these seem to always match, as long as the header is present.

It took me a while to find the origin of the problem in the code:
There is actually an interpretation/parsing step of the 'Content-Type'-header, which can be found in the HeadersMixin used by ClientResponse.

To Reproduce

With the issue originally stemming from email.parser.HeaderParser, this shows the different handling of possible content-type values:

from email.parser import HeaderParser
hp = HeaderParser()
hp.parsestr('Content-Type: jpg').get_content_type() # 'text/plain'
hp.parsestr('Content-Type: image/jpg').get_content_type() # 'image/jpg'
hp.parsestr('Content-Type: /jpg').get_content_type() # '/jpg'

With the last example showing, that only a "/" is required to make it valid.

Code in master is still the same as of release v3.9.3.

Expected behavior

Either

ClientResponse.content_type and ClientResponse.headers['Content-Type'] should always match, when the header is present.

or

You should add a remark to the documentation, that an invalid 'Content-Type'-header content will result in a value of 'text/plain' for the content_type-property, reflecting the behavior of email.parse.HeaderParser or rather the specs of RFC 2045.

Logs/tracebacks

Response-headers of an image with invalid content-type:

HTTP/2 200 OK
server: myracloud
date: Tue, 22 Oct 2024 13:25:15 GMT
content-type: jpg
x-goog-generation: 1727331007965793
x-goog-metageneration: 1
x-goog-stored-content-encoding: identity
x-goog-stored-content-length: 106629
x-goog-hash: crc32c=3AAhdw==
x-goog-hash: md5=bwW1tfx11avS0Rl0N3fPPQ==
x-goog-storage-class: STANDARD
access-control-allow-origin: *
x-guploader-uploadid: AHmUCY0B28EEcIBUrivfD9eVMQBcVCRH8GmlD9RQBdM-8-vCNwUb3SP6k79NsaKN2NeuMypHNsQ
expires: Fri, 18 Oct 2024 19:38:01 GMT
cache-control: public, max-age=3600
last-modified: Thu, 26 Sep 2024 06:10:07 GMT
etag: "6f05b5b5fc75d5abd2d119743777cf3d"
age: 0
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
strict-transport-security: max-age=63072000; includeSubDomains; preload
x-cdn: 1
X-Firefox-Spdy: h2

Response as interpreted by Firefox (see Response tab of request in devtools):
Name: gcp84aaccede5fd42bb831900c9a7b6fbd0.jpg
Dimensions: 1500 × 1125
MIME Type: image/jpeg

Python Version

$ python --version
Python 3.9.17

aiohttp Version

$ python -m pip show aiohttp
Name: aiohttp
Version: 3.9.3
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author:
Author-email:
License: Apache 2
Location: c:\program files\python39\lib\site-packages
Requires: aiosignal, async-timeout, attrs, frozenlist, multidict, yarl
Required-by:

multidict Version

$ python -m pip show multidict
Name: multidict
Version: 6.0.4
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache 2
Location: c:\program files\python39\lib\site-packages
Requires:
Required-by: aiohttp, yarl

propcache Version

$ python -m pip show propcache
WARNING: Package(s) not found: propcache

yarl Version

$ python -m pip show yarl
Name: yarl
Version: 1.9.3
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache-2.0
Location: c:\program files\python39\lib\site-packages
Requires: idna, multidict
Required-by: aiohttp

OS

Windows

Related component

Client

Additional context

It's interesting that a standard browser like Mozilla Firefox e.g. can easily deal with such a response and interpret the content correctly as an image.
I'm not sure, if it would be a valid request for aiohttp to also be able to find out the correct content-type based on the URL, body or invalid content-type header.
You can see from the header-example in the logs, that it's probably not trivial to confirm (from the headers alone), that the response is indeed an image.
Simplest solution would probably be to transform common image subtypes (avif, webp, jpg/jpeg, png, gif) found in the header into valid mimetypes.
But I'm not sure how common the issue of invalid content-types from a server actually is.

Why does the content_type-property not return a MimeType-instance though, rather than a simple (but interpreted) string?
Instead of using email.parser.HeaderParser, why not use parse_mimetype() from helpers.py?

According to Section 7.2.1 of RFC 2616 (see last paragraph) a present 'Content-Type'-header should NOT be interpreted.

Code of Conduct

  • I agree to follow the aio-libs Code of Conduct
@e-d-n-a e-d-n-a added the bug label Oct 22, 2024
@Dreamsorcerer Dreamsorcerer added good first issue Good for newcomers Hacktoberfest We think it's good for https://hacktoberfest.digitalocean.com/ labels Oct 22, 2024
@github-staff github-staff deleted a comment Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug good first issue Good for newcomers Hacktoberfest We think it's good for https://hacktoberfest.digitalocean.com/
Projects
Development

No branches or pull requests

2 participants