-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MIME types: test 0x0B and 0x0C #13615
Conversation
There are no reviewers for this pull request besides its author. Please reach out on W3C's irc server (irc.w3.org, port 6665) on channel #testing (web client) to get help with this. Thank you! |
@MattMenke2 I'm not entirely sure what would be best. Perhaps you know the history behind how 0x0B and 0x0C are sometimes special in HTTP (both are called out by the HTTP specification as sometimes being equivalent in certain clients, but this being legacy). The MIME type parser treats 0x0C in a special manner because it's part of the definition of ASCII whitespace that most of the platform uses. Should the MIME type parser instead use the HTTP definition of whitespace (one that excludes 0x0C)? Doing that might be confusing to some non-HTTP consumers of that parser though. |
(To be clear, this question also affects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I don't know the history here. I wouldn't be surprised if this were inherited from how parsing of MIME types of email attachments are parsed, but that's purely speculation. For purely HTTP use, I'd rather treat 0x0C and 0x0D like any other character, if we can get away with it.
I'd be surprised if either character were common in non-HTTP consumers, just because it seems weird to randomly add whitespace characters that the average text editor/viewer has no idea what to do with, but that's again just speculation. Probably be best to consult someone familiar with those other contexts.
For this header, only Safari treats 0x0C as whitespace. (I'm assuming 0x0D was a typo above and meant to be 0x0B, which nobody treats specially here.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementing things in whatwg-mimetype gives the following test mismatches:
● mime-types.json › text/html;charset=
gbk (text/html;charset=\vgbk)
Expected value to equal:
"UTF-8"
Received:
null
● mime-types.json › text/html;charset=
gbk (text/html;charset=\fgbk)
Expected value to equal:
"UTF-8"
Received:
null
● mime-types.json › text/html;
charset=gbk (text/html;\vcharset=gbk)
Expected value to equal:
"UTF-8"
Received:
null
● mime-types.json › text/html;
charset=gbk (text/html;\fcharset=gbk)
Expected value to equal:
"text/html;charset=gbk"
Received:
"text/html"
I think you need to remember that \u000B (\v) and \u000C (\f) are not HTTP token code points and so terminate the parsing algorithm in other ways.
Not sure how to reply with quotes, @domenic are you sure about that? It looks to me like in the MIME sniff spec, values with non-token characters are ignored on a per-subfield basis: If all of the following are true parameterName is not the empty string then set mimeType’s parameters[parameterName] to parameterValue. The earlier rules for type/subtype fail if they contain non-token characters, but the name/value fields are just ignored if they have non-token characters. |
Right, I guess "terminate" was imprecise. "Ignored the current construct" is what I meant. So for example, given Similarly, for |
Sorry, you're right. Was thinking we should be defaulting to UTF-8, but null is the default value. |
I added some additional tests here for "HTTP newlines" and 0x0B and 0x0C in failure conditions (when "part of" type or subtype). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All tests pass
No description provided.