-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-investigate "compatibility caseless" matching #1666
Comments
Please add the 'i18n' label to this item so that our WG can track. The I18N WG has regularly and consistently begged HTML to remove compatibility caseless. @domenic You're right: that test doesn't really distinguish because it uses various circled/decorated numbers and numbers have no case mapping. What you need are case fold tests. When I get a chance later I'll try to make you a good test list. |
@aphillips added! https://github.com/whatwg/html/labels/topic%3A%20i18n Thanks so much for being willing to help with the test cases! |
First stab here: http://software.hixie.ch/utilities/js/live-dom-viewer/saved/4380 @r12a do you have any tests hanging around? |
Thanks @aphillips! Here are my results in various browsers, with ~ indicating that they were in the same group (i.e. could not both be selected at once):
Does the ~ relation given by Edge, or by Chrome/Firefox, match any of the Unicode-defined equivalences? |
@domenic: Seems straightforward. Edge appears to be doing similar to
Chrome/Firefox is doing what CharmodNorm calls Note that older IE was known to some something "similar to" compatibility caseless, so Edge may be do the same thing. I don't know what the difference is between "similar to" and compatibility caseless was. We could probably find out by writing a bunch of tests. |
OK, cool. Since we have some freedom here to pick between incompatible alternatives, I think the best approach is to take the I18N WG's advice as to which alternative would be best. I assume the I18N WG would prefer Unicode C+F to compatibility caseless? If so, can you tell me the best way to reference that algorithm? I thought we were supposed to go for the Unicode Standard chapter 3 section 3.13, but I guess maybe we should refer to CharmodNorm's "Unicode case-insensitive matching"? It would also be ideal if someone could take on the responsibility of updating https://github.com/w3c/web-platform-tests/blob/master/html/semantics/forms/the-input-element/radio-groupname-case.html, which I presume currently expects compatibility caseless... |
What Safari does (also in stable) seems vastly superior to the alternatives. Avoiding complicated Unicode algorithms for identifiers is still the goal. |
@annevk: I don't agree with the sentiment that case-insensitive matching is a "complicated Unicode algorithm" or that algorithm avoidance should be our primary goal. Our primary goal should be to produce something that is easy and meaningful to use for browser users and page authors. Which means that I agree with you on the choice of case-sensitive matching. The test by itself demonstrates why it's a better choice: there appears to be no reason why radio buttons with distinct string identifiers should "light up" in tandem. It doesn't appear to add any value to have radio buttons behave otherwise. Is there a compelling argument for case _in_sensitive matching? |
I think the only argument for insensitive is legacy compat. But WebKit provides a compelling counterexample. Note that there's a separate use of "compatibility caseless" for image maps, for which I believe WebKit does do some form of case folding... Maybe the right thing to do here is to see if Chrome could add use counters for scenarios where there are case-insenstive matches (both for radio buttons and image maps). If the counters are very low, we could change the spec to be case sensitive. |
I filed https://bugs.chromium.org/p/chromium/issues/detail?id=639477 and cc'ed @esprehn who is bullish about moving to case-sensitive matching. |
It looks like the use counters have reached stable with Chrome 54. The results are:
Thus, I think we should move to converge the spec and implementations with WebKit, and only match case-sensitively for both cases. \o/ |
This fixes #1666. As discussed there, browsers are not interoperable about what type of Unicode case-insensitivity they implement here, with WebKit even using case-sensitive matching for the radio button case (but not for the image map case). Data from Blink's use counters reveals however that the Unicode case-insensitivity is never triggered, and even ASCII case-insensitivity is triggered extraordinarily rarely. Additionally, the semantics of these attributes is more like an identifier than anything else, and so case-insensitive comparison never really made sense in the first place (it was only done for legacy Internet Explorer compatibility). As such, we move to converge on case-sensitive matching in all cases.
This fixes #1666. As discussed there, browsers are not interoperable about what type of Unicode case-insensitivity they implement here, with WebKit even using case-sensitive matching for the radio button case (but not for the image map case). Data from Blink's use counters reveals however that the Unicode case-insensitivity is never triggered, and even ASCII case-insensitivity is triggered extraordinarily rarely. Additionally, the semantics of these attributes is more like an identifier than anything else, and so case-insensitive comparison never really made sense in the first place (it was only done for legacy Internet Explorer compatibility). As such, we move to converge on case-sensitive matching in all cases.
@domenic Thank you for adding tests for this and filing https://bugzilla.mozilla.org/show_bug.cgi?id=1312456 |
FYI, updating WebKit to treat usemap case-sensitively broke an internal Apple site. Sadly, this site used to work in Safari 10, Firefox 52 and Chrome 57. The site is now broken is Safari TP, Firefox nightly 55 and Chrome Canary 59. This is the only evidence of breakage we have so far and this is an internal site so this is not too bad. I just thought I would mention it here in case others see breakage too. |
Thanks for reporting in. Would the internal site have worked if we'd used ASCII case-insensitive matching? |
Yes: |
This follows a spec update which was discussed in whatwg/html#1666.
This follows a spec update which was discussed in whatwg/html#1666.
This fixes whatwg#1666. As discussed there, browsers are not interoperable about what type of Unicode case-insensitivity they implement here, with WebKit even using case-sensitive matching for the radio button case (but not for the image map case). Data from Blink's use counters reveals however that the Unicode case-insensitivity is never triggered, and even ASCII case-insensitivity is triggered extraordinarily rarely. Additionally, the semantics of these attributes is more like an identifier than anything else, and so case-insensitive comparison never really made sense in the first place (it was only done for legacy Internet Explorer compatibility). As such, we move to converge on case-sensitive matching in all cases.
The spec mandates "compatibility caseless" matching for radio button groups and hash fragments. However, this doesn't seem to be what browsers do, as evidenced by http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4377 (all radio groups are separate despite their names being compatibility-caseless matches)
Background:
What I really need here is a set of test cases that can distinguish what algorithm is being used unambiguously, so that we can run them against all modern browsers and see where reality lands. I think we need to distinguish between a few alternatives:
My test http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4377 seems to rule out compatibility caseless but it doesn't help me distinguish between the other possibilities. I'd appreciate help with that.
Source inspection reveals that Blink goes with the lowercasing option, whereas WebKit does case folding (so I guess just "caseless"). At least for usemap; I haven't checked radio button groups yet.
An alternate approach is to continue the source inspection on all open-source browsers. But that won't help us write web platform tests...
/cc @r12a @aphillips @nattokirai @littledan
The text was updated successfully, but these errors were encountered: