Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-investigate "compatibility caseless" matching #1666

Closed
domenic opened this issue Aug 12, 2016 · 16 comments · Fixed by #1941
Closed

Re-investigate "compatibility caseless" matching #1666

domenic opened this issue Aug 12, 2016 · 16 comments · Fixed by #1941
Assignees
Labels
compat Standard is not web compatible or proprietary feature needs standardizing i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. normative change

Comments

@domenic
Copy link
Member

domenic commented Aug 12, 2016

The spec mandates "compatibility caseless" matching for radio button groups and hash fragments. However, this doesn't seem to be what browsers do, as evidenced by http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4377 (all radio groups are separate despite their names being compatibility-caseless matches)

Background:

What I really need here is a set of test cases that can distinguish what algorithm is being used unambiguously, so that we can run them against all modern browsers and see where reality lands. I think we need to distinguish between a few alternatives:

  • Caseless
  • Compatibility caseless
  • Canonical caseless
  • Identifier caseless
  • "Bad" algorithms that do things like lowercasing both and comparing (instead of case-folding both and comparing)

My test http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4377 seems to rule out compatibility caseless but it doesn't help me distinguish between the other possibilities. I'd appreciate help with that.

Source inspection reveals that Blink goes with the lowercasing option, whereas WebKit does case folding (so I guess just "caseless"). At least for usemap; I haven't checked radio button groups yet.

An alternate approach is to continue the source inspection on all open-source browsers. But that won't help us write web platform tests...

/cc @r12a @aphillips @nattokirai @littledan

@domenic domenic added normative change compat Standard is not web compatible or proprietary feature needs standardizing labels Aug 12, 2016
@aphillips
Copy link
Contributor

Please add the 'i18n' label to this item so that our WG can track.

The I18N WG has regularly and consistently begged HTML to remove compatibility caseless.

@domenic You're right: that test doesn't really distinguish because it uses various circled/decorated numbers and numbers have no case mapping. What you need are case fold tests. When I get a chance later I'll try to make you a good test list.

@domenic domenic added the i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. label Aug 12, 2016
@domenic
Copy link
Member Author

domenic commented Aug 12, 2016

@aphillips added! https://github.com/whatwg/html/labels/topic%3A%20i18n

Thanks so much for being willing to help with the test cases!

@aphillips
Copy link
Contributor

aphillips commented Aug 12, 2016

First stab here: http://software.hixie.ch/utilities/js/live-dom-viewer/saved/4380

@r12a do you have any tests hanging around?

@domenic
Copy link
Member Author

domenic commented Aug 15, 2016

Thanks @aphillips! Here are my results in various browsers, with ~ indicating that they were in the same group (i.e. could not both be selected at once):

  • Firefox
    • ① ⑴ ⒈: all separate
    • À À À à à: À ~ à, À ~ à
    • Å Å: Å ~ Å (all the same)
    • ᾛ ἣι ᾓ: ᾛ ~ ᾓ
    • A a: A ~ a
  • Edge
    • ① ⑴ ⒈: all separate
    • À À À à à: À ~ À ~ À ~ à ~ à (all the same)
    • Å Å: Å ~ Å (all the same)
    • ᾛ ἣι ᾓ: ᾛ ~ ᾓ
    • A a: A ~ a
  • Chrome (same as Firefox)
    • ① ⑴ ⒈: all separate
    • À À À à à: À ~ à, À ~ à
    • Å Å: Å ~ Å (all the same)
    • ᾛ ἣι ᾓ: ᾛ ~ ᾓ
    • A a: A ~ a
  • Safari Tech Preview (no case folding)
    • ① ⑴ ⒈: all separate
    • À À À à à: all separate
    • Å Å: all separate
    • ᾛ ἣι ᾓ: all separate
    • A a: all separate

Does the ~ relation given by Edge, or by Chrome/Firefox, match any of the Unicode-defined equivalences?

@aphillips
Copy link
Contributor

@domenic: Seems straightforward.

Edge appears to be doing similar to compatibility caseless, which is defined as:

NFKD(toCasefold(NFKD(toCasefold(NFD(X))))) = NFKD(toCasefold(NFKD(toCasefold(NFD(Y)))))

Chrome/Firefox is doing what CharmodNorm calls Unicode C+F or normal humans would call "caseless".

Note that older IE was known to some something "similar to" compatibility caseless, so Edge may be do the same thing. I don't know what the difference is between "similar to" and compatibility caseless was. We could probably find out by writing a bunch of tests.

@domenic
Copy link
Member Author

domenic commented Aug 15, 2016

OK, cool. Since we have some freedom here to pick between incompatible alternatives, I think the best approach is to take the I18N WG's advice as to which alternative would be best. I assume the I18N WG would prefer Unicode C+F to compatibility caseless? If so, can you tell me the best way to reference that algorithm? I thought we were supposed to go for the Unicode Standard chapter 3 section 3.13, but I guess maybe we should refer to CharmodNorm's "Unicode case-insensitive matching"?

It would also be ideal if someone could take on the responsibility of updating https://github.com/w3c/web-platform-tests/blob/master/html/semantics/forms/the-input-element/radio-groupname-case.html, which I presume currently expects compatibility caseless...

@annevk
Copy link
Member

annevk commented Aug 16, 2016

What Safari does (also in stable) seems vastly superior to the alternatives. Avoiding complicated Unicode algorithms for identifiers is still the goal.

@aphillips
Copy link
Contributor

@annevk: I don't agree with the sentiment that case-insensitive matching is a "complicated Unicode algorithm" or that algorithm avoidance should be our primary goal. Our primary goal should be to produce something that is easy and meaningful to use for browser users and page authors.

Which means that I agree with you on the choice of case-sensitive matching. The test by itself demonstrates why it's a better choice: there appears to be no reason why radio buttons with distinct string identifiers should "light up" in tandem. It doesn't appear to add any value to have radio buttons behave otherwise.

Is there a compelling argument for case _in_sensitive matching?

@domenic
Copy link
Member Author

domenic commented Aug 16, 2016

I think the only argument for insensitive is legacy compat. But WebKit provides a compelling counterexample.

Note that there's a separate use of "compatibility caseless" for image maps, for which I believe WebKit does do some form of case folding...

Maybe the right thing to do here is to see if Chrome could add use counters for scenarios where there are case-insenstive matches (both for radio buttons and image maps). If the counters are very low, we could change the spec to be case sensitive.

@domenic
Copy link
Member Author

domenic commented Aug 19, 2016

I filed https://bugs.chromium.org/p/chromium/issues/detail?id=639477 and cc'ed @esprehn who is bullish about moving to case-sensitive matching.

@domenic
Copy link
Member Author

domenic commented Oct 20, 2016

It looks like the use counters have reached stable with Chrome 54. The results are:

  • For image maps:
    • 0.0084% of pages have a map attribute that matches using strict case-sensitive matching
    • <0.0001% of pages have a map attribute that matches using ASCII case-insensitive matching, but not using case-sensitive mapping
    • Zero additional matches are triggered by Chrome's Unicode case-insensitive matching
  • For radio buttons
    • 0.1846% of pages have radio buttons that match using strict case-sensitive matching
    • <0.0001% of pages have other radio buttons that get grouped based on being an ASCII case-insensitive match but not a case-sensitive match
    • Zero additional matches are triggered by Chrome's Unicode case-insensitive matching

Thus, I think we should move to converge the spec and implementations with WebKit, and only match case-sensitively for both cases. \o/

@domenic domenic self-assigned this Oct 20, 2016
domenic added a commit that referenced this issue Oct 20, 2016
This fixes #1666. As discussed there, browsers are not interoperable
about what type of Unicode case-insensitivity they implement here, with
WebKit even using case-sensitive matching for the radio button case (but
not for the image map case). Data from Blink's use counters reveals
however that the Unicode case-insensitivity is never triggered, and even
ASCII case-insensitivity is triggered extraordinarily rarely.
Additionally, the semantics of these attributes is more like an
identifier than anything else, and so case-insensitive comparison never
really made sense in the first place (it was only done for legacy
Internet Explorer compatibility). As such, we move to converge on
case-sensitive matching in all cases.
domenic added a commit that referenced this issue Oct 24, 2016
This fixes #1666. As discussed there, browsers are not interoperable
about what type of Unicode case-insensitivity they implement here, with
WebKit even using case-sensitive matching for the radio button case (but
not for the image map case). Data from Blink's use counters reveals
however that the Unicode case-insensitivity is never triggered, and even
ASCII case-insensitivity is triggered extraordinarily rarely.
Additionally, the semantics of these attributes is more like an
identifier than anything else, and so case-insensitive comparison never
really made sense in the first place (it was only done for legacy
Internet Explorer compatibility). As such, we move to converge on
case-sensitive matching in all cases.
@bzbarsky
Copy link
Contributor

@domenic Thank you for adding tests for this and filing https://bugzilla.mozilla.org/show_bug.cgi?id=1312456

@cdumez
Copy link

cdumez commented Mar 21, 2017

FYI, updating WebKit to treat usemap case-sensitively broke an internal Apple site. Sadly, this site used to work in Safari 10, Firefox 52 and Chrome 57. The site is now broken is Safari TP, Firefox nightly 55 and Chrome Canary 59. This is the only evidence of breakage we have so far and this is an internal site so this is not too bad. I just thought I would mention it here in case others see breakage too.

@domenic
Copy link
Member Author

domenic commented Mar 21, 2017

Thanks for reporting in. Would the internal site have worked if we'd used ASCII case-insensitive matching?

@cdumez
Copy link

cdumez commented Mar 21, 2017

Yes:
usemap="#homebarmap" / <map name="HomeBarMap”>

Zirro added a commit to Zirro/jsdom that referenced this issue Nov 30, 2018
This follows a spec update which was discussed in whatwg/html#1666.
domenic pushed a commit to jsdom/jsdom that referenced this issue Dec 2, 2018
This follows a spec update which was discussed in whatwg/html#1666.
alice pushed a commit to alice/html that referenced this issue Jan 8, 2019
This fixes whatwg#1666. As discussed there, browsers are not interoperable
about what type of Unicode case-insensitivity they implement here, with
WebKit even using case-sensitive matching for the radio button case (but
not for the image map case). Data from Blink's use counters reveals
however that the Unicode case-insensitivity is never triggered, and even
ASCII case-insensitivity is triggered extraordinarily rarely.
Additionally, the semantics of these attributes is more like an
identifier than anything else, and so case-insensitive comparison never
really made sense in the first place (it was only done for legacy
Internet Explorer compatibility). As such, we move to converge on
case-sensitive matching in all cases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compat Standard is not web compatible or proprietary feature needs standardizing i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. normative change
Development

Successfully merging a pull request may close this issue.

6 participants