Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I18N string best practices vs. design-principles #454

Open
aphillips opened this issue Oct 5, 2023 · 8 comments
Open

I18N string best practices vs. design-principles #454

aphillips opened this issue Oct 5, 2023 · 8 comments
Assignees
Labels
Agenda+ i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on. Status: Consensus to write We have TAG consensus about the principle but someone needs to write it (see "To Write" project) Status: In Progress We're working on it but ideas not fully formed yet.

Comments

@aphillips
Copy link

I18N maintains a set of best practices (see also Editor's Copy). One set of these pertain to the definition of strings.

Design-principles has a set of best practices related to strings that prefer DOMString except when one needs USVString. This guidance is a little unclear, since there are non-DOM/non-JS/non-HTML specs that would prefer to use e.g. xsd:string or a string definition that is close to USVString (based on code points). We (I18N) have recently had to go through this exercise with RDF-star and a couple of other specs and this is causing us to revise our best practices.

It would not be helpful if TAG and I18N recommended different things. Our tendency is to prefer a string definition based on scalar value string with an exception for the space where UTF-16/WTF-16 (see #323) are the best practice vs. Design-principles (which is backwards from that). We also want to develop text that explains to specs that touch on UTF-8 based file formats why they want to use DOMString in their interfaces.

Note well: we are not disagreeing with the design principles as currently articulated.

I was actioned with making this issue. Please add the i18n-needs-resolution label to this issue. (Shouldn't horizontal groups have permission enough to set horizontal review labels on your document repos?)

@LeaVerou LeaVerou added the i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on. label Oct 6, 2023
@LeaVerou
Copy link
Member

LeaVerou commented Oct 6, 2023

I was actioned with making this issue. Please add the i18n-needs-resolution label to this issue. (Shouldn't horizontal groups have permission enough to set horizontal review labels on your document repos?)

I wish this were possible. Unfortunately GitHub’s permission model around labels is all or nothing. Anyhow, I added the label.

@annevk
Copy link
Member

annevk commented Oct 6, 2023

(You could create a triage team and give them triage permissions. That's how WHATWG attempts to solve this. That gives people some power over issues, but no write access.)

aphillips added a commit to aphillips/bp-i18n-specdev that referenced this issue Oct 12, 2023
- Change the MUSTard to recommend USVString first.
- Split the MUSTard into two BPs
- Add a missing link around one instance of `USVString`
- Add an issue that includes a link to w3ctag/design-principles#454 and
  appropriately scary sounding text.
@torgo torgo added this to the 2023-12-04-week milestone Dec 3, 2023
@torgo torgo added the Agenda+ label Dec 3, 2023
@torgo torgo added the Status: Consensus to write We have TAG consensus about the principle but someone needs to write it (see "To Write" project) label Dec 4, 2023
@torgo
Copy link
Member

torgo commented Dec 4, 2023

Thanks for this @aphillips - we agree we need to be more clear - @ylafon took an action on today's TAG call to come up with a PR that adds clarity.

@aphillips
Copy link
Author

@torgo @ylafon Thanks, although please note that this is not just a request for editing by TAG but for our groups to ensure that we say exactly the same thing. If there are differences, we should ensure that we coordinate or have an agreement about what the recommendations ought to be.

As a reference, here is what we currently recommend: link

@ylafon
Copy link
Member

ylafon commented Dec 4, 2024

@aphillips reading the note in bp-i18n-specdev, I don't find major inconsistencies between the two version, the section here is discouraging more the use of ByteString.
How is it seen as being different?

@jyasskin
Copy link
Contributor

jyasskin commented Dec 4, 2024

I think our "use DOMString unless you have a specific reason not to." is too strong and appears to contradict i18n's "Unless you have a reason not to, use a string definition consistent with USVString." I like their text:

Use a string definition consistent with DOMString if your specification does not process the internal value of strings and is not required to check for unpaired surrogate code points, or if your specification pertains to the [DOM], defines a JavaScript API or data format, or defines strings as opaque values that are not processed.

I feel like @annevk is likely to have the best sense of what's going to go wrong, if anything, if we copy that into the design principles.

@annevk
Copy link
Member

annevk commented Dec 4, 2024

Is there really any difference between presenting an HTML title element to the end user and any other kind of string? If you are going to convert to a scalar value string for algorithm purposes (URL parser, text encoding, I/O) you might as well use USVString upfront (even though it is arguably wrong as it's not the most efficient way to go about this), but otherwise I think the web platform has decided on plain strings, however ugly we may find them.

The only reason to deviate from this would be some kind of "brave new world" scenario, such as Wasm or ML, and even then I would expect a solid debate.

@aphillips
Copy link
Author

@ylafon asks:

I don't find major inconsistencies between the two version, the section here is discouraging more the use of ByteString.
How is it seen as being different?

I18N and TAG's recommendations are, unsurprisingly, close to being the same when you peer under the hood and actually read the details. But TAG starts with "use DOMString unless" and I18N starts with "use USVString unless". Careless readers might perceive these as being opposed to one-another.

Note that I18N uses very subtle wording "a string definition consistent with USVString"

@annevk notes:

If you are going to convert to a scalar value string for algorithm purposes (URL parser, text encoding, I/O) you might as well use USVString upfront (even though it is arguably wrong as it's not the most efficient way to go about this), but otherwise I think the web platform has decided on plain strings, however ugly we may find them.

I tend to agree: most of the time, a string is a bag of code units. And most of the time there is no need to look inside the bag to see if there are surrogates without a dance partner. We do not mean for implementations to actually use a scalar value representation (i.e. the 32-bit one) on the wire ever.

In such a world, the difference is actually about whether we require some code point hygiene or not. It's probably cleaner to use TAG's approach, but our WG members have pushed back on this in the past.

What we want is something pithy and straightforward, where users do not have to read the attendant documentation in order to know what to do and which will NOT result in different interpretations. We want to avoid:

dictionary X {
   DOMString foo;
   USVString bar;    // why??
   DOMString baz;
};

Perhaps something more like the following (unseen by my colleagues and written here the first time):

Use DOMString unless you have a specific reason not to, especially when specifying document formats, data structures, protocols, or for API interfaces.

Use USVString when specifying algorithms or processes that handle the contents of strings or for cases in which unpaired surrogates would result in an error.

I have added this discussion to I18N's agenda for our teleconference tomorrow (2024-12-05)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Agenda+ i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on. Status: Consensus to write We have TAG consensus about the principle but someone needs to write it (see "To Write" project) Status: In Progress We're working on it but ideas not fully formed yet.
Projects
None yet
Development

No branches or pull requests

6 participants