Define technical terms #30

echeran · 2020-02-04T00:11:27Z

This isn't a feature so much as an attempt to see if we can all define some of the technical terms that we've been using, based on what they mean to us. From the rich conversations we've had in meetings and in Github issue threads, I suspect that we may be using different terms to describe the same concept, or even the same term to refer to different concepts.

The thought is that if we each individually fill out our definitions for terms in a separate comment, we can compare notes at the end. And maybe some good consequences can pop out from that (reduced vocabulary -> clearer convos? realization of more common ground?)

I've gone through a few of the Github threads with the largest use of technical-sounding terms (skipping over things like linguistic terms), and listed them in order of first observed occurrence. To participate, just copy-paste the terms that mean something to you into a new comment below, and define them in your own words.

DOM Overlay 1 2
interpolate 1 2 3
translation merging
syntax - 1 2 3
authoring - 1 2
selector - 1 2 3
file format - 1 2 3
markup - 1
placeholder type
ITS data category 1 2
UI language/locale - 1
placeholder/variable locale / formatting locale - 1 2 3
resource locale - 1
compound message - 1
placeable - 1
locale chain - 1
language fallback - 1 2 3
language negotiation - 1 2
full message - 1
fragment message / sentence fragment - 1 2
localizable resource - 1
API argument syntax - 1
storage format / representation - 1 2
binding syntax - 1
API - 1 2 3 4
implementation - 1 2 3
positional variable - 1
standard message format - 1
spec - 1
interchange format / representation - 1 2
source code representation - 1
data model - 1 2
serialization - 1
runtime format - 1 2
build/parse-time format - 1
translation/localization format - 1 2
nested markup - 1
multi-variant message - 1
intermediate format / representation - 1
consumed format - 1
AST (abstract syntax tree) - 1 2 3 4
import/export filter for a format - 1
developer format - 1
multi-level filter for a format - 1
message syntax - 1

mimckenna · 2020-02-12T17:09:57Z

UI language/locale

The language and jurisdiction used to determine content in the User Interface. E.g. en_US/CA means use content for Canada in the US English language.

mimckenna · 2020-02-12T17:16:17Z

placeholder/variable locale / formatting locale

The locale (regional language) used for placeholder content or to format dynamic variables. E.g. a placeholder variable may be for a currency value 123456.78 INR. In a US locale (en_US/US) it would appear as ₹1,234,567.89 INR, in the Ukraine (en_US/UA) as 1.234.567,89 ₹ INR and in India (en_GB/IN) as ₹ 12,34,567.89 INR

mimckenna · 2020-02-12T17:22:48Z

resource locale

The locale used for content in the user interface. This is not always the formatting locale. For example, in iOS and Android, a user may choose a language for content, and "regional settings" for formats - I can choose to have my UI in American English but my regional settings for dates, time, numbers, calendar to follow European or even Chinese conventions.

mimckenna · 2020-02-12T17:29:10Z

locale chain

This refers to a list of approved locales to choose content from if content for the requested locale does not exist. Internally, we use this to keep the user flow contained in legally approved content since we have legal obligations to present certain terms and liabilities if we do not.

For a user, this could refer to a their list of languages in order of preference similar to the HTML Accept-Language list.

mimckenna · 2020-02-12T17:44:21Z

language fallback

If content is not available in the requested language, this is the process to "fall back" to the next available language in the locale chain.

zbraniecki · 2020-02-12T18:53:16Z

@mimckenna is there a value in diverging at all from https://unicode.org/reports/tr35/#Identifiers ?

mimckenna · 2020-02-13T16:48:51Z

@zbraniecki - good point - I was rattling off how I would describe these terms to members of my technical team. Now that you bring it up, I agree that we should use pre-existing definitions in Unicode or other accepted standards, with links back to those definitions, where they exist. I'll be happy to revise my first attempts above following that model.

I think what the purpose of this doc is to provide very concise descriptions of each term as opposed to the multi-paragraph descriptions in tr35. We can wordsmith these down to single-sentence descriptions with pointers back to reference standards.

Some of these terms/concepts may not be found in tr35, such as AST (abstract syntax tree), and some others have fairly complete but lengthy descriptions that would be difficult, as is, to fit in a single sentence when pulled from TR35. An example would be the developer viewpoint (developer format) of message syntax vs the translator view of messages to translate (translation/localization format). This is implied in TR35 but it is far from concise.

mimckenna · 2020-02-13T16:50:52Z

Question - shall I direct-edit my initial responses, or revise as additional github comments in the thread?

dchiba · 2020-02-13T18:24:34Z

locale chain is called language priority list in BCP 47. Similarly, language fallback is known as lookup matching. Maybe there should be a mention at the end. e.g. "... This is so called language priority list in the formal BCP 47 terminology." or something alike.

zbraniecki · 2020-02-13T19:05:23Z

I don't think we should follow BCP47 btw. Unicode UTS #35 is more up to date.

Similarly, language fallback is known as lookup matching

I don't think that's accurate. Lookup matching is a particular strategy of language negotiation.
There are others possible. For example, in Gecko we use three - https://firefox-source-docs.mozilla.org/intl/locale.html#filtering-matching-lookup

echeran · 2020-02-13T23:20:13Z

Question - shall I direct-edit my initial responses, or revise as additional github comments in the thread?

Revising in the form of additional comments (maybe all batched together as one comment?) sounds good.

At this point, I think it's good to get as many responses recorded as possible first. Afterwards, we see what we observe & discuss. I'm interested in waiting for everyone's responses are because I think the responses in aggregate can help give us better clarity than just a few.

echeran · 2020-02-14T01:13:41Z

Here are my responses for the terms I've used:

interpolate - formatting & inserting values inside of a string
translation merging - combining the translated version of content back into the source document
syntax - the arrangement of tokens in a file (or equivalent) according to a set of rules. syntax is a prerequisite for semantics (meaning).
file format - similar to syntax. the syntax (& semantics) of a file.
markup - a syntax that allows the more essential data to be annotated by secondary data, not necessarily specific to HTML/XML
placeholder type - whether a placeholder represents number/plurals, gender, etc.
language fallback - for locales (lang+region), not just language alone -- a mechanism for determining an acceptable locale (based on lang and/or region) when info for the exact locale is not available
API - the exact function names with argument lists and types and expected behaviors / output for a particular software
implementation - how a particular specification of inputs, behaviors, and outputs are achieved for a particular programming language or platform
spec - high-level description of expected inputs, behaviors, and outputs
positional variable - a placeholder within a message whose value is injected according to a specific index in a list of provided values
data model - the structure of data that describes how a message should be formatted. is independent of implementation. a part of the specification
serialization - how to turn data structures to/from text/bytes
AST (abstract syntax tree) - a tree showing the structure of tokens in a file, according to the syntax. is the output of a parser, usually according to a grammar, most often used in the context of a compiler
import/export filter for a format - "filter" is an Okapi term for a file format reader/writer that converts a file into the Okapi data model for an l10n document
multi-level filter for a format - an Okapi file format reader/writer that supports the nesting of two different file formats in one doc (ex: JSON file whose strings contain HTML)

dchiba · 2020-02-14T19:58:08Z

@zbraniecki This is for defining terminology, so I didn't mean to suggest following BCP47. I meant to use standard terms such as language priority list and lookup from BCP47, which defines elements and schemes for dealing with locales.

BCP47 says filtering and lookup are the 2 basic types of matching schemes. The former can return multiple locales, while the latter always returns one. Mozilla's "Matching" appears to be lookup performed on each requested locale. Isn't it an enhanced form of lookup?

UTS 35 is a more practical specification adopted by CLDR and others. I think it is a way to practice BCP47 and I would agree with you that we may find it reasonable to have some deviations in the corners.

zbraniecki · 2020-02-14T20:30:26Z

Isn't it an enhanced form of lookup?

It is. But BCP47 specifies how the algorithms should work, and I don't think its the only available way, hence I'd prefer not to overspecify that yet :)

Also, I noticed "language fallback" and "locale chain" used separately.

I'd like to suggest we use "locale fallback chain" - a list of locales created as a result of locale negotiation between available and requested locales.
In general, I'd suggest we never talk about a single locale, since all our operations are intended to fallback in case of errors and missing data.

mihnita · 2020-02-15T03:09:36Z

Looks like this area is fuzzy: matching / enhanced form of lookup / fallback / negotiation
I can explain how Android works.

But it is negotiation (once, at application launch time) followed by fallback (for every resource load)

The UI / formatting locales are not 100% separate, you will not see French UI with German dates (unless you did something wrong in the localization :-)

I can add a couple of joke definitions:

localization: using local variables in your code
globalization: using global variables in your code

More serious definitions:

localization (l10n) : what translators do
internationalization (i18n) : what programmers do
globalization (g11n) : what companies do

They are over-simplifications, but are memorable and clarify what the "buckets" are.

Sure, "translators" really mean localization companies + localization engineers + linguistic / technical QA + PMs + language managers + terminology managers + language specialists + reviewers + DTP, etc, a full machinery.
Same as "programmers" means also UX, PMs, QA, tech writers, etc. It also means architecture, data (and database) structures, design documents, etc, not just "sit down and write code"

L10N takes "assets" from human language A and gives back the same assets in languages X, Y, Z, ...

I18N make sure applications work without any code changes in any human language.
If I see an English string in a French application it might be a localization bug (a translator missed it) or an internationalization one (hard-coded)

G11N means market research, financial, tech infrastructure, legal, competition, decisions to localize or not (you can go to a new country without translating), etc.

See how much I had to write, just to (partially) explain 3 short bullets?
:-)

echeran · 2020-02-18T23:37:40Z

Thanks for the replies so far.

See how much I had to write, just to (partially) explain 3 short bullets?

That was indeed thorough (thanks), but 1-2 sentences per term would have sufficed. The reason -- more to the point -- here are sets of terms where I want to figure out what they mean. Are they the same? related? different? and how so? My hunch is that we can dedupe some of these terms to have clearer conversations (or less likely to talk past each other).

selector
placeholder type
ITS data category

placeholder
placeable
fragment message / sentence fragment
variable (positional / named)

API argument syntax
storage format / representation
binding syntax
standard message format
interchange format / representation
source code reprsentation
runtime format
build/parse-time format
translation/localization format
intermediate format / representation
consumed format
AST (abstract syntax tree)
developer format
message syntax

romulocintra · 2020-07-20T21:32:06Z

@echeran this can be closed no? i thinks all terms already merged into glossary, that's correct?

echeran · 2020-07-20T22:26:05Z

Yes, I agree, all these terms are contained in the glossary, so we should be fine to close this issue. I'll do that now.

romulocintra added the documentation Improvements or additions to documentation label Feb 19, 2020

echeran mentioned this issue Feb 21, 2020

Meeting Agenda : 2020-02-24 #46

Closed

romulocintra mentioned this issue Apr 13, 2020

Meeting Agenda : 2020-04-20 #74

Closed

echeran mentioned this issue Apr 20, 2020

Updated terminology wiki page #78

Closed

echeran closed this as completed Jul 20, 2020

echeran mentioned this issue May 19, 2022

Naming things #248

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define technical terms #30

Define technical terms #30

echeran commented Feb 4, 2020

mimckenna commented Feb 12, 2020

mimckenna commented Feb 12, 2020

mimckenna commented Feb 12, 2020

mimckenna commented Feb 12, 2020

mimckenna commented Feb 12, 2020

zbraniecki commented Feb 12, 2020

mimckenna commented Feb 13, 2020

mimckenna commented Feb 13, 2020

dchiba commented Feb 13, 2020

zbraniecki commented Feb 13, 2020

echeran commented Feb 13, 2020

echeran commented Feb 14, 2020

dchiba commented Feb 14, 2020 •

edited

Loading

zbraniecki commented Feb 14, 2020

mihnita commented Feb 15, 2020 •

edited

Loading

echeran commented Feb 18, 2020

romulocintra commented Jul 20, 2020

echeran commented Jul 20, 2020

Define technical terms #30

Define technical terms #30

Comments

echeran commented Feb 4, 2020

mimckenna commented Feb 12, 2020

UI language/locale

mimckenna commented Feb 12, 2020

placeholder/variable locale / formatting locale

mimckenna commented Feb 12, 2020

resource locale

mimckenna commented Feb 12, 2020

locale chain

mimckenna commented Feb 12, 2020

language fallback

zbraniecki commented Feb 12, 2020

mimckenna commented Feb 13, 2020

mimckenna commented Feb 13, 2020

dchiba commented Feb 13, 2020

zbraniecki commented Feb 13, 2020

echeran commented Feb 13, 2020

echeran commented Feb 14, 2020

dchiba commented Feb 14, 2020 • edited Loading

zbraniecki commented Feb 14, 2020

mihnita commented Feb 15, 2020 • edited Loading

echeran commented Feb 18, 2020

romulocintra commented Jul 20, 2020

echeran commented Jul 20, 2020

dchiba commented Feb 14, 2020 •

edited

Loading

mihnita commented Feb 15, 2020 •

edited

Loading