Do separators need to be preserved when parsing? #39

domenic · 2017-10-10T15:17:27Z

Spinning off from #36 to discuss this specific issue.

In https://github.com/jsdom/content-type-parser we, for some reason, preserved the separator between MIME type parameters. (So, e.g., it maybe ; or ; with a space or something else.) This means that when you parse-then-serialize, the separators are preserved.

I believe this might have been necessary to pass some of the XMLHttpRequest or File API web platform tests? I'm not sure; perhaps @nicolashenry, the original author of that code, remembers.

Browsing through usages in jsdom it appears we make use of this ability in FileReader, which (at least in our implementation) parses-then-serializes the Blob's type when creating the data: URL. We also use it when creating a Blob xhr.response, to set the blob's type value from the parse-then-serialize of the Content-Type header.

Maybe there are web platform tests that assume this, but should not, because it's pretty silly?

I suppose we could try changing this in jsdom and see what tests start failing.

The text was updated successfully, but these errors were encountered:

annevk · 2017-10-10T16:09:55Z

I was hoping we could simplify this at least somewhat, but it's indeed different from what implementations do. They mostly pass through the input (or remove everything but "charset"), which seems rather broken.

nicolashenry · 2017-10-10T20:13:55Z

If I remember well, this was necessary to pass this XMLHttpRequest web platform test :
https://github.com/w3c/web-platform-tests/blob/master/XMLHttpRequest/send-content-type-charset.htm

annevk · 2017-10-11T03:14:17Z

Well, when you set the Content-Type header no MIME type parsing should take place (at least not at that level). So if that's the test it would indicate a bug in jsdom of sorts.

domenic · 2017-10-11T03:37:36Z

You need to parse it and replace charset per https://xhr.spec.whatwg.org/#dom-xmlhttprequest-send step 4 "otherwise".

annevk · 2017-10-11T03:46:12Z

Ah, my bad. That's a very good point. Thanks!

annevk · 2017-10-11T10:43:13Z

This is actually a rather big problem as this seemingly preserves all kinds of garbage. Parameters without values are not really a thing per the official MIME type definition for instance.

@bzbarsky do you know to what extent we need to preserve the exact behavior of https://www.w3c-test.org/XMLHttpRequest/send-content-type-charset.htm? Do we need to take some other approach to MIME types than we do for other formats? Not have a clear parse / model / serialize separation?

annevk · 2017-10-11T10:45:32Z

We could of course have a one-off parser just for XMLHttpRequest's Content-Type request header's charset parameter processing, but that does not seem great.

domenic · 2017-10-12T16:38:43Z

Maybe @tyoshino / @ricea have a perspective here.

annevk · 2017-10-12T16:47:05Z

Good point, and @yutakahirano.

yutakahirano · 2017-10-13T01:56:47Z

...then set all the `charset` parameters whose value is not a byte-case-insensitive match for encoding of that header’s value to encoding.

This doesn't sound like a text replacement. I'm under an impression that equivalent representations such as extra spaces or quotation are also allowed.

annevk · 2017-10-13T08:22:57Z

Yeah, but it so happens that's not what implementations do (or what we test for). If that is web compatible there might not be an issue here though.

annevk · 2017-10-13T08:23:49Z

It's also not clear to me that an internal representation that supports duplicate parameter names is a good one.

foolip · 2017-10-18T14:14:45Z

So, things that we'll need to go test:

XMLHttpRequest
File API
Anything for Fetch?

foolip · 2017-10-18T14:16:04Z

I suppose we're all hoping that all APIs could either pass through strings verbatim, or parse+serialize. But it looks like what's actually implemented in some cases is something funkier?

Do not merge, testing only.

foolip · 2017-10-18T14:50:18Z

About https://wpt.fyi/XMLHttpRequest/send-content-type-charset.htm, there's impressively little agreement about some of the cases, which I guess is good news in a way if we want to change things.

I added some cases in web-platform-tests/wpt#7882 to see what happens to whitespace after semicolon.

domenic · 2017-10-18T16:38:24Z

I suppose we're all hoping that all APIs could either pass through strings verbatim, or parse+serialize. But it looks like what's actually implemented in some cases is something funkier?

Well, verbatim isn't quite an option for the XHR stuff at least. And in general it'd be much nicer to do parse+serialize. (E.g., it would match how URLs are handled.)

In general it seems likely that changing this stuff is web-compatible, yes :). I am hopeful that @annevk's work will create a model everyone can work toward.

foolip · 2017-10-25T14:52:39Z

OK, so from web-platform-tests/wpt#7882 we can conclude:

Everyone preserves space (or lack of space) before/after semicolon
Chrome+Safari turns both charset=utf-8 and charset=bogus into charset=UTF-8
Edge doesn't turn bogus charsets into UTF-8 or uppercase charset=utf-8 into UTF-8
Firefox turns charset=bogus into charset=UTF-8, but doesn't uppercase charset=utf-8

So, yes, separators do need to be preserved when parsing!

annevk · 2017-10-25T15:07:25Z

Well, if it's web-compatible I'd much rather not preserve them. Keeping syntax around in the object representation is rather ugly. And if you cared about preserving existing behavior to that extent we'd end up with several different MIME type parsers/scanners rather than a single unified model. Existing code bases aren't exactly great.

foolip · 2017-10-25T15:28:37Z

Are there other contexts where whitespace isn't preserved? It's very hard to guess what the compat implications of changing that are. Presumably implementations don't have a clear parse / model / serialize separation here, but rather something like "fix up MIME string" and "extract information from MIME string"?

yutakahirano · 2017-10-25T23:55:20Z

The current chromium implementation is "search-and-replace" which I don't love. I would like to replace the implementation with a parse-then-serialize impl when I have time, but adding whitespace preservation as a requirement will make the parse-then-serialize implementation much harder.

foolip · 2017-10-26T11:36:13Z

@yutakahirano, if you'd like to implement and ship such a model, that makes me more optimistic about it. I have no idea how to estimate the compat risk of this, but I suppose a start would be to see what kind of normalization (space after semicolon or not) would result in the fewest changes in the wild.

yutakahirano · 2017-11-15T05:14:53Z

I added a use counter to Chromium. It will tell us how risky it is to change the current behavior.

This CL introduces mime type parser and stringifier to wpt/XMLHttpRequest/send-content-type-charset in order to accept implementations that is actually conforming to the spec but was rejected by the test due to some text representation errors. Bug: whatwg/mimesniff#39 Change-Id: I99466e2e596bb9c1b7f11267ad4ff0a886913086

This CL introduces a mime type parser and stringifier to wpt/XMLHttpRequest/send-content-type-charset in order to accept implementations that are actually conforming to the spec but were rejected by the test due to some text representation errors. Bug: whatwg/mimesniff#39 Change-Id: I99466e2e596bb9c1b7f11267ad4ff0a886913086

annevk · 2017-11-24T15:33:10Z

To be clear, my hope is that we can resolve this issue by changing XMLHttpRequest and the test in web-platform-tests: web-platform-tests/wpt#8422.

I'd rather not preserve separator syntax, duplicate parameters, and invalid parameters as required by those tests currently. That seems way too much complexity for a such a niche use case.

annevk · 2017-11-29T14:31:51Z

I think we've reached agreement here. Tests have been written as well. What's still missing is an update to the XMLHttpRequest Standard, but I'll write that after we land #36 I suspect.

annevk mentioned this issue Oct 12, 2017

Define data: URLs whatwg/fetch#579

Merged

7 tasks

annevk mentioned this issue Oct 13, 2017

Sort out MIME type tests #42

Closed

4 tasks

annevk added the topic: mime type label Oct 16, 2017

foolip added a commit to web-platform-tests/wpt that referenced this issue Oct 18, 2017

Test stuff for whatwg/mimesniff#39

0b58da8

Do not merge, testing only.

foolip mentioned this issue Oct 25, 2017

Revamp MIME type section #36

Merged

3 tasks

chromium-wpt-export-bot mentioned this issue Nov 16, 2017

[XHR] Introduce a mime type parser to a WPT web-platform-tests/wpt#8275

Closed

annevk mentioned this issue Nov 26, 2017

MIME type parsing, stricter rules #44

Closed

annevk closed this as completed Nov 29, 2017

foolip mentioned this issue Dec 6, 2017

Test stuff for https://github.com/whatwg/mimesniff/issues/39 web-platform-tests/wpt#7882

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do separators need to be preserved when parsing? #39

Do separators need to be preserved when parsing? #39

domenic commented Oct 10, 2017

annevk commented Oct 10, 2017

nicolashenry commented Oct 10, 2017

annevk commented Oct 11, 2017

domenic commented Oct 11, 2017

annevk commented Oct 11, 2017

annevk commented Oct 11, 2017

annevk commented Oct 11, 2017 •

edited

Loading

domenic commented Oct 12, 2017

annevk commented Oct 12, 2017

yutakahirano commented Oct 13, 2017

annevk commented Oct 13, 2017

annevk commented Oct 13, 2017

foolip commented Oct 18, 2017

foolip commented Oct 18, 2017

foolip commented Oct 18, 2017

domenic commented Oct 18, 2017

foolip commented Oct 25, 2017

annevk commented Oct 25, 2017

foolip commented Oct 25, 2017

yutakahirano commented Oct 25, 2017

foolip commented Oct 26, 2017

yutakahirano commented Nov 15, 2017

annevk commented Nov 24, 2017 •

edited

Loading

annevk commented Nov 29, 2017

Do separators need to be preserved when parsing? #39

Do separators need to be preserved when parsing? #39

Comments

domenic commented Oct 10, 2017

annevk commented Oct 10, 2017

nicolashenry commented Oct 10, 2017

annevk commented Oct 11, 2017

domenic commented Oct 11, 2017

annevk commented Oct 11, 2017

annevk commented Oct 11, 2017

annevk commented Oct 11, 2017 • edited Loading

domenic commented Oct 12, 2017

annevk commented Oct 12, 2017

yutakahirano commented Oct 13, 2017

annevk commented Oct 13, 2017

annevk commented Oct 13, 2017

foolip commented Oct 18, 2017

foolip commented Oct 18, 2017

foolip commented Oct 18, 2017

domenic commented Oct 18, 2017

foolip commented Oct 25, 2017

annevk commented Oct 25, 2017

foolip commented Oct 25, 2017

yutakahirano commented Oct 25, 2017

foolip commented Oct 26, 2017

yutakahirano commented Nov 15, 2017

annevk commented Nov 24, 2017 • edited Loading

annevk commented Nov 29, 2017

annevk commented Oct 11, 2017 •

edited

Loading

annevk commented Nov 24, 2017 •

edited

Loading