-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do separators need to be preserved when parsing? #39
Comments
I was hoping we could simplify this at least somewhat, but it's indeed different from what implementations do. They mostly pass through the input (or remove everything but "charset"), which seems rather broken. |
If I remember well, this was necessary to pass this XMLHttpRequest web platform test : |
Well, when you set the |
You need to parse it and replace charset per https://xhr.spec.whatwg.org/#dom-xmlhttprequest-send step 4 "otherwise". |
Ah, my bad. That's a very good point. Thanks! |
This is actually a rather big problem as this seemingly preserves all kinds of garbage. Parameters without values are not really a thing per the official MIME type definition for instance. @bzbarsky do you know to what extent we need to preserve the exact behavior of https://www.w3c-test.org/XMLHttpRequest/send-content-type-charset.htm? Do we need to take some other approach to MIME types than we do for other formats? Not have a clear parse / model / serialize separation? |
We could of course have a one-off parser just for XMLHttpRequest's |
Good point, and @yutakahirano. |
This doesn't sound like a text replacement. I'm under an impression that equivalent representations such as extra spaces or quotation are also allowed. |
Yeah, but it so happens that's not what implementations do (or what we test for). If that is web compatible there might not be an issue here though. |
It's also not clear to me that an internal representation that supports duplicate parameter names is a good one. |
So, things that we'll need to go test:
|
I suppose we're all hoping that all APIs could either pass through strings verbatim, or parse+serialize. But it looks like what's actually implemented in some cases is something funkier? |
Do not merge, testing only.
About https://wpt.fyi/XMLHttpRequest/send-content-type-charset.htm, there's impressively little agreement about some of the cases, which I guess is good news in a way if we want to change things. I added some cases in web-platform-tests/wpt#7882 to see what happens to whitespace after semicolon. |
Well, verbatim isn't quite an option for the XHR stuff at least. And in general it'd be much nicer to do parse+serialize. (E.g., it would match how URLs are handled.) In general it seems likely that changing this stuff is web-compatible, yes :). I am hopeful that @annevk's work will create a model everyone can work toward. |
OK, so from web-platform-tests/wpt#7882 we can conclude:
So, yes, separators do need to be preserved when parsing! |
Well, if it's web-compatible I'd much rather not preserve them. Keeping syntax around in the object representation is rather ugly. And if you cared about preserving existing behavior to that extent we'd end up with several different MIME type parsers/scanners rather than a single unified model. Existing code bases aren't exactly great. |
Are there other contexts where whitespace isn't preserved? It's very hard to guess what the compat implications of changing that are. Presumably implementations don't have a clear parse / model / serialize separation here, but rather something like "fix up MIME string" and "extract information from MIME string"? |
The current chromium implementation is "search-and-replace" which I don't love. I would like to replace the implementation with a parse-then-serialize impl when I have time, but adding whitespace preservation as a requirement will make the parse-then-serialize implementation much harder. |
@yutakahirano, if you'd like to implement and ship such a model, that makes me more optimistic about it. I have no idea how to estimate the compat risk of this, but I suppose a start would be to see what kind of normalization (space after semicolon or not) would result in the fewest changes in the wild. |
I added a use counter to Chromium. It will tell us how risky it is to change the current behavior. |
This CL introduces mime type parser and stringifier to wpt/XMLHttpRequest/send-content-type-charset in order to accept implementations that is actually conforming to the spec but was rejected by the test due to some text representation errors. Bug: whatwg/mimesniff#39 Change-Id: I99466e2e596bb9c1b7f11267ad4ff0a886913086
This CL introduces a mime type parser and stringifier to wpt/XMLHttpRequest/send-content-type-charset in order to accept implementations that are actually conforming to the spec but were rejected by the test due to some text representation errors. Bug: whatwg/mimesniff#39 Change-Id: I99466e2e596bb9c1b7f11267ad4ff0a886913086
This CL introduces a mime type parser and stringifier to wpt/XMLHttpRequest/send-content-type-charset in order to accept implementations that are actually conforming to the spec but were rejected by the test due to some text representation errors. Bug: whatwg/mimesniff#39 Change-Id: I99466e2e596bb9c1b7f11267ad4ff0a886913086
This CL introduces a mime type parser and stringifier to wpt/XMLHttpRequest/send-content-type-charset in order to accept implementations that are actually conforming to the spec but were rejected by the test due to some text representation errors. Bug: whatwg/mimesniff#39 Change-Id: I99466e2e596bb9c1b7f11267ad4ff0a886913086
This CL introduces a mime type parser and stringifier to wpt/XMLHttpRequest/send-content-type-charset in order to accept implementations that are actually conforming to the spec but were rejected by the test due to some text representation errors. Bug: whatwg/mimesniff#39 Change-Id: I99466e2e596bb9c1b7f11267ad4ff0a886913086
To be clear, my hope is that we can resolve this issue by changing XMLHttpRequest and the test in web-platform-tests: web-platform-tests/wpt#8422. I'd rather not preserve separator syntax, duplicate parameters, and invalid parameters as required by those tests currently. That seems way too much complexity for a such a niche use case. |
I think we've reached agreement here. Tests have been written as well. What's still missing is an update to the XMLHttpRequest Standard, but I'll write that after we land #36 I suspect. |
Spinning off from #36 to discuss this specific issue.
In https://github.com/jsdom/content-type-parser we, for some reason, preserved the separator between MIME type parameters. (So, e.g., it maybe
;
or;
with a space or something else.) This means that when you parse-then-serialize, the separators are preserved.I believe this might have been necessary to pass some of the XMLHttpRequest or File API web platform tests? I'm not sure; perhaps @nicolashenry, the original author of that code, remembers.
Browsing through usages in jsdom it appears we make use of this ability in FileReader, which (at least in our implementation) parses-then-serializes the Blob's
type
when creating thedata:
URL. We also use it when creating aBlob
xhr.response
, to set the blob'stype
value from the parse-then-serialize of the Content-Type header.Maybe there are web platform tests that assume this, but should not, because it's pretty silly?
I suppose we could try changing this in jsdom and see what tests start failing.
The text was updated successfully, but these errors were encountered: