Documentation not mentioning treatment of invalid Content-Type headers in ClientResponse (mismatch of header content and property value) #9522
Labels
bug
good first issue
Good for newcomers
Hacktoberfest
We think it's good for https://hacktoberfest.digitalocean.com/
Describe the bug
Recently a client script of mine received responses (status=200) for image-fetches denoting 'text/plain' in the
content_type
property of the response, while the body was still an image and not an error message (as would be my first guess).Only after a while, I checked the actual header to see, that the server sends an invalid value of just "jpg", instead of "image/jpg".
So it's confusing here, that
ClientResponse.content_type
gives a different result thanClientResponse.headers['Content-Type']
.In the documentation, these seem to always match, as long as the header is present.
It took me a while to find the origin of the problem in the code:
There is actually an interpretation/parsing step of the 'Content-Type'-header, which can be found in the
HeadersMixin
used byClientResponse
.To Reproduce
With the issue originally stemming from
email.parser.HeaderParser
, this shows the different handling of possible content-type values:With the last example showing, that only a "/" is required to make it valid.
Code in master is still the same as of release v3.9.3.
Expected behavior
Either
ClientResponse.content_type
andClientResponse.headers['Content-Type']
should always match, when the header is present.or
You should add a remark to the documentation, that an invalid 'Content-Type'-header content will result in a value of 'text/plain' for the
content_type
-property, reflecting the behavior ofemail.parse.HeaderParser
or rather the specs of RFC 2045.Logs/tracebacks
Python Version
aiohttp Version
multidict Version
propcache Version
yarl Version
OS
Windows
Related component
Client
Additional context
It's interesting that a standard browser like Mozilla Firefox e.g. can easily deal with such a response and interpret the content correctly as an image.
I'm not sure, if it would be a valid request for
aiohttp
to also be able to find out the correct content-type based on the URL, body or invalid content-type header.You can see from the header-example in the logs, that it's probably not trivial to confirm (from the headers alone), that the response is indeed an image.
Simplest solution would probably be to transform common image subtypes (avif, webp, jpg/jpeg, png, gif) found in the header into valid mimetypes.
But I'm not sure how common the issue of invalid content-types from a server actually is.
Why does the content_type-property not return a MimeType-instance though, rather than a simple (but interpreted) string?
Instead of using
email.parser.HeaderParser
, why not use parse_mimetype() fromhelpers.py
?According to Section 7.2.1 of RFC 2616 (see last paragraph) a present 'Content-Type'-header should NOT be interpreted.
Code of Conduct
The text was updated successfully, but these errors were encountered: