-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sorted list by code unit #179
Comments
DOMStrings are not UTF-16. They are sequences of 16-bit code units; they are not required to be valid UTF-16. This is equivalent to calling |
PR at #182 Also note that this is existing behavior from previous iterations of the spec. |
I'm not quite buying it. I mean, yes, they aren't exactly UTF-16, in exactly the same way as we're discussing in the WHATWG infra space just now. But really they are strings and there is a clear relationship to Unicode. The sort you mention is fast and easy, but ignores supplementary characters and the unfortunate sorting of those characters a surrogate pairs. I18n would prefer to make supplementary characters work as first class citizens. And note that this is an index, not just a bag of strings. There is provision for range operations and these fail utterly on surrogate pairs. |
"that this is an index, not just a bag of strings" The "sorted list" construct is used only when getting the names of a set of object stores or indexes: The names are identifiers specified by the developer in code. And... this is unchanged since the first version of the spec (the latter is new, but must be consistent), and is interoperably deployed in all browsers already. |
I'm doing this on a tablet in an airport, bear with me. Interoperability is harmed by changing the runtime sort? The names need to stay consistent, but the ordering could be improved. If the list is in a different order on a different browser, is that harmful? Developers can use non ascii names too. Many such are generated from customer data or input as well. "The sorted list is used only when getting the names": probably for presentation. Computers don't care. You'd use a hash otherwise. :-) |
As an aside, I think this construct should change to "sorted name list" for clarity, since it's only used for this one purpose. I'll update #184 The list was originally unsorted; then someone complained about the possible inconsistency when enumerating and so we agreed on a sorting convention and enshrined it in the spec. sigh |
Oops, I meant #182 |
https://www.w3.org/TR/IndexedDB-2/#constructs
While sorting by code unit is easy to do in (UTF-16) DOMString, it produces a perverse order for supplementary characters (i.e. those > U+FFFF). Shouldn't this be sorted by codepoint?
Note too that a list sorted by code unit/codepoint will not be accessible to users.
Note that there is a sorting algorithm here that works on code units
This is I18N comment w3c/i18n-activity#362. Please add the i18n-comment label.
The text was updated successfully, but these errors were encountered: