-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify how document.cookie diverges from [COOKIES] RFC #804
Comments
Paging @mikewest. |
I'd love to help! Here is what we can do:
I will try to provide you test results somewhere around next week. |
Cool! Another thing here worth checking is |
@annevk AFAIK browsers uses the same code for all cookie parsing scenarios. Spec violations in |
I see, in that case it seems like something @mikewest and @mnot should be solving in the RFC. Your testing will still be useful, obviously, but given the scope of the problem it does not seem like something that needs to be addressed in the HTML Standard. Although I can understand if we need to make adjustments for a revised RFC that does handle this properly. |
So, we will continue discussion here for now and once we will have some data and analyzis we will ping IETF guys, I guess? |
Very good timing. We're about to start opening up the cookie RFC, so yes do ping us when you have some results. Any idea how long that will be? |
OMG, this is amazing!! |
@inikulin this is really sobering. Thank you! What was the effective document charset for the test page? |
@bsittler UTF-8 |
FYI test runner sources are here: https://github.com/inikulin/cookie-compat |
Thank you guys for all the kind words, I hope you will find it useful. Further steps:
|
Wow indeed, really great stuff! It seems to me that the first 17 tests could be brought into (at least rough) interop with a fairly simple spec change to Section 5.2. The remaining tests demonstrate enough interop that they look more like browser bugs to me. That's assuming that all of the browsers don't want to fix the underlying bugs in the first 17 tests, of course. It'd be very useful to know how much content on the Web currently relies upon this behaviour, but gathering that data is likely to be problematic... If we do want to change the spec, someone will need to write up an Internet-Draft describing the proposed changes. I can help with that. @inikulin would you mind pinging the HTTP-WG about this on its mailing list https://lists.w3.org/Archives/Public/ietf-http-wg/? If you don't want to subscribe, I can forward a message for you, or you could even just open up a bug at https://github.com/httpwg/http-extensions/issues. I just want to make sure that you get credit for this awesome work. |
@inikulin what was the system codepage for Edge and IE? Have you tried changing it? If https://stackoverflow.com/questions/1969232/allowed-characters-in-cookies is to be believed, non-ASCII characters may "work" in IE when they are present in the system codepage, where "work" means they will be wire-encoded in that codepage (never UTF-8, since Windows system codepage can't be set to 65001) but exposed to JavaScript using the corresponding Unicode characters. I'd be especially interested to see the results for systems with larger-coverage (CJK?) or non-1252 system codepages. Likewise, have you tried server-generated cookies with encodings other than UTF-8, e.g. latin-1? |
Nope, haven't adjusted windows code page for tests. I'll try to run with codepages with bigger character set tomorrow at work, because I don't have access to win machine currently. |
Nope |
One more thought: it may be worth checking both reading and writing behavior of the backslash \u005c Same question for tilde \u007e (I'm asking these oddly specific questions because I'm wondering whether all of printable ASCII other than semicolon is actually safe in cookie values across browsers) Edit: names too (barring equal sign of course) Edit: Also, in the meta http-equiv case, are the results the same for raw document-charset characters vs. HTML-entified versions? more edit: Yet another IE-specific question: does document.cookie in IE (and Edge?) round-trip Unicode when the characters are first converted to bytes? e.g. |
I've added results for IE and Edge with system codepage 950 (big5) and 932 (shift_jis): http://inikulin.github.io/cookie-compat/ (spoiler: it didn't work out) Regarding #804 (comment) if you wouldn't mind, I will work on it later, because I'm really running out of spare time currently. I've created issue in cookie-compat for this task to not forget about it: inikulin/cookie-compat#3 |
Thank you very much On Wed, Jun 22, 2016, 05:21 Ivan Nikulin [email protected] wrote:
|
On Windows 7 with a US English system locale running IE 9, JavaScript-written cookies subsequently read from JavaScript seem to reliably round-trip characters whose ISO 8859-1 encodings fall in the ISO 2022 GR range (0xA0 ... 0xFF) in addition to most of printable ASCII. This seems to be the case regardless of the document character encoding. Additionally, I tried a few characters whose Windows-1252 encodings fall in the ISO 2022 C1 range (0x80 ... 0x9F) and they appear to round-trip successfully, too. Characters not representable in Windows-1252 are apparently converted to question mark (other printable characters) or dropped (ASCII control characters.) I have not yet tested with a different system locale. I suspect that cookies are simply serialized in the IE cookie jar using the default codepage of the system locale. |
Indeed, after switching the system locale to Japanese (with "ANSI" and "OEM" codepages both switched to 932) and rebooting, cookies behave exactly as if they are being stored in CP932 (approximately Shift JIS), with characters like Euro sign \u20ac converted to question mark and japanese text preserved. This is independent of document charset, so the same Japanese text written by script running in a Shift JIS document is readable by script running in a UTF-8 document without mangling, and vice versa. |
Wow, that is not something we want to standardize upon. How would that even work with code points that cannot be represented by the encoding? |
It doesn't. They are converted to question marks (in other words, data is On Tue, Jun 28, 2016, 00:02 Anne van Kesteren [email protected]
|
Just did a little further testing, and verified that even with explicit UTF-8 or UTF-16 (little-endian) byte-order marks in the cookie name and/or cookie value, IE and Edge still always interpret the cookie according to the system "ANSI" codepage. Non-ASCII cookie names and values set by the server are sent back to the server without mangling, so there's nothing to prevent a server from storing UTF-8 in a cookie (e.g. UTF-8 cookie names/values containing Also, attempts to set cookies from scripts with "ANSI" code page-unrepresentable characters in their names and/or values do not always convert those to question marks - sometimes a different fallback is used. For instance, with a US English system locale |
I'm doubtful that further testing of IE/Edge's quirks is going to be helpful. We know they do weird stuff they would never put into a web spec. |
Right, I was merely attempting to assess the compatibility risk of having the new API only support UTF-8 (and possibly also "raw byte array") interpretation for cookie data, which would be incompatible (in Edge) with the system "ANSI" codepage interpretation in |
One "fun" thing I noticed today: |
Currently the spec says
However, in the real world things like
document.cookie = "foo"
work and have an effect. There are probably many other possibilities; in general the RFC just has a grammar that things might not match, whereas I imagine browsers just accept anything and try to make sense of it, even if it fails to match the grammar.@bsittler noticed this while working on some service worker cookie stuff, and previously it has come up in the jsdom project and its related tough-cookie helper:
@Sebmaster and @inikulin led the charge for this in jsdom, so maybe they could help us spec the correct behavior for how
document.cookie
parses cookies? Alternately, looking at open-source browser code would get us pretty far.This might be a compat issue if everyone hasn't managed to magically converge on a single behavior despite the lack of precise spec. Tentatively tagging as such for now.
The text was updated successfully, but these errors were encountered: