Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sorted list by code unit #179

Closed
aphillips opened this issue Mar 23, 2017 · 7 comments
Closed

sorted list by code unit #179

aphillips opened this issue Mar 23, 2017 · 7 comments
Labels
has-pr i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on. needs-pr
Milestone

Comments

@aphillips
Copy link

https://www.w3.org/TR/IndexedDB-2/#constructs

A sorted list is a DOMStringList containing strings sorted in ascending order by code unit.

While sorting by code unit is easy to do in (UTF-16) DOMString, it produces a perverse order for supplementary characters (i.e. those > U+FFFF). Shouldn't this be sorted by codepoint?

Note too that a list sorted by code unit/codepoint will not be accessible to users.

Note that there is a sorting algorithm here that works on code units

This is I18N comment w3c/i18n-activity#362. Please add the i18n-comment label.

@inexorabletash inexorabletash added the i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on. label Mar 23, 2017
@inexorabletash
Copy link
Member

DOMStrings are not UTF-16. They are sequences of 16-bit code units; they are not required to be valid UTF-16.

This is equivalent to calling sort() on an ECMAScript array of strings, which is the model for the behavior. I'll add a note.

@inexorabletash
Copy link
Member

PR at #182

Also note that this is existing behavior from previous iterations of the spec.

@aphillips
Copy link
Author

aphillips commented Mar 24, 2017

I'm not quite buying it. I mean, yes, they aren't exactly UTF-16, in exactly the same way as we're discussing in the WHATWG infra space just now. But really they are strings and there is a clear relationship to Unicode. The sort you mention is fast and easy, but ignores supplementary characters and the unfortunate sorting of those characters a surrogate pairs. I18n would prefer to make supplementary characters work as first class citizens. And note that this is an index, not just a bag of strings. There is provision for range operations and these fail utterly on surrogate pairs.

@inexorabletash
Copy link
Member

"that this is an index, not just a bag of strings"

The "sorted list" construct is used only when getting the names of a set of object stores or indexes:

The names are identifiers specified by the developer in code.

And... this is unchanged since the first version of the spec (the latter is new, but must be consistent), and is interoperably deployed in all browsers already.

@aphillips
Copy link
Author

I'm doing this on a tablet in an airport, bear with me.

Interoperability is harmed by changing the runtime sort? The names need to stay consistent, but the ordering could be improved. If the list is in a different order on a different browser, is that harmful?

Developers can use non ascii names too. Many such are generated from customer data or input as well. "The sorted list is used only when getting the names": probably for presentation. Computers don't care. You'd use a hash otherwise. :-)

@inexorabletash
Copy link
Member

As an aside, I think this construct should change to "sorted name list" for clarity, since it's only used for this one purpose. I'll update #184

The list was originally unsorted; then someone complained about the possible inconsistency when enumerating and so we agreed on a sorting convention and enshrined it in the spec. sigh

@inexorabletash
Copy link
Member

Oops, I meant #182

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has-pr i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on. needs-pr
Projects
None yet
Development

No branches or pull requests

2 participants