Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support making words definite, indefinite or construct #6

Open
grhoten opened this issue Mar 7, 2024 · 14 comments
Open

Support making words definite, indefinite or construct #6

grhoten opened this issue Mar 7, 2024 · 14 comments
Labels
discuss Discussion item

Comments

@grhoten
Copy link
Member

grhoten commented Mar 7, 2024

Support should be added to make a word definite, indefinite or construct. The construct form is a discussion point for Semitic languages like Hebrew or Arabic.

Here are some examples:

English

  • the cat
  • a cat
  • a unicorn
  • an umbrella
  • an LED light

Spanish

  • el gato
  • la gata
  • los gatos
  • las gatas

French

  • la lumière
  • les lumières
  • l'appareil
  • le ventilateur
  • les ventilateurs

Swedish

case singular & indefinite singular & definite plural & indefinite plural & definite
nominative katt katten katter katterna
genitive katts kattens katters katternas
@grhoten grhoten added the discuss Discussion item label Mar 7, 2024
@BrunoCartoni
Copy link

Are there any specific messages that could benefit from such mechanism?

@grhoten
Copy link
Member Author

grhoten commented Mar 13, 2024

I recommend reviewing this UTW video called Automatic Grammar Agreement in Message Formatting. In languages that frequently gender their nouns, the definite and indefinite article varies a lot, and it depends on the grammatical properties of the noun or adjective adjacent to the article. So if I ever want to say "The ${device} is on", knowing how to put the definite article in front of the device is very important, especially when the vocabulary of "device" is significantly large or provided by the user. For a language like Swedish, you don't add an article, you inflect the word.

@BrunoCartoni
Copy link

does it mean that the person who authors the message in the first place would need to write:
" ${the device} is on" ?

@macchiati
Copy link
Member

macchiati commented Mar 14, 2024 via email

@grhoten
Copy link
Member Author

grhoten commented Mar 14, 2024

does it mean that the person who authors the message in the first place would need to write: " ${the device} is on" ?

Sorry for the confusion. Here's 2 ways that this can be supported that I'm currently aware of for addressing this specific topic. I'm using Spanish in my example.

  1. ^[El %@](inflect: true)
  2. ${device.definite}

The "%@" in the first example is the variable name with Markdown and JSON syntax. The "device" in the second example is the variable name with UEL syntax. I don't have a proposed solution for Unicode's MFWG syntax, and I think that should be a separate topic of mapping the concept into syntax.

@macchiati
Copy link
Member

macchiati commented Mar 14, 2024 via email

@BrunoCartoni
Copy link

BrunoCartoni commented Mar 15, 2024 via email

@macchiati
Copy link
Member

macchiati commented Mar 15, 2024 via email

@BrunoCartoni
Copy link

BrunoCartoni commented Mar 18, 2024 via email

@grhoten
Copy link
Member Author

grhoten commented Mar 18, 2024

is there a way to detect that the message is ill-formed?

Can you clarify what you mean?

  1. If you mean if the state=definiteness was not used for a language that would benefit from using such syntax, I don't think that's within the scope here. That seems like a lint/static analyzer topic. I would prefer to talk about how to make it possible instead of worrying about how authors are not benefiting from such functionality. I don't consider it ill-formed in such a situation. It's maybe worth of a warning in the message formatting framework.
  2. If you mean that the device variable is already definite, say it was named "The light". That's easy to detect and leave it as is instead of turning it into "The The light".
  3. If you mean that the device variable is already definite through other styles, say it was named "My light", and you wanted to change it to "Your light", or you didn't want to turn it into "The My light", that's a harder topic. In that case, it not about it being ill-formed. It's about grammatical correctness. I'm fine with being aware of such situations. At a certain point, I'd rather defer handling more complex messages to a future date. I just want to handle the simple example in this issue.

@BrunoCartoni
Copy link

BrunoCartoni commented Mar 18, 2024 via email

@grhoten
Copy link
Member Author

grhoten commented Mar 18, 2024

That’s a valid issue, but I consider that to be a message format framework specific issue outside the concept of inflection. I don’t believe that the MF2 framework has a connection to the pronoun information nor the concept for it.

The 3 other frameworks that I’ve been involved with have varying degrees of automatic access to the pronoun information. 1 requires the message author to adopt the framework extension, which can cause a translator communication issue. 2 can usually inflect anything without developer intervention, but various levels of mistakes by developers can still happen.

I’d prefer the inflection engine to be separate from the message format syntax, and I’d prefer to separate out message format adoption issues separate from this topic of just adding the ability to add a specific type of definiteness to a word or concept.

@macchiati
Copy link
Member

We want the inflection information to work for multiple clients, including but of course not limited to MF2.0

Going back to Bruno's question about:

Just to be sure: if the message author writes " the ${device} is on"
(instead of " ${device, state=definiteness} is on"), is there a way to
detect that the message is ill-formed?

MF2.0 is still in development, especially the inflection bits, so caveat lector.

Say the English is:

The {$device} is on.

In general, it is the localization software that allows translators access to the message. I think the thinking is that certain option values will be translatable (like definiteness and case), so that for translating into German, the translator could replace that message pattern by something like the following. Delete the redundant text, and add the state option.

{$device state=definite} ist eingeschaltet.

That would thus handle:

Das Gerät ist eingeschaltet.
Die Maschine ist eingeschaltet.
Der Trockner ist eingeschaltet.

Now, if $device could be plural, the normal mechanism would be the following. Remember, the translator will not see the syntax; it should be presented in a much friendlier way.

English

.match {$deviceCount :number}
one {{The {$device} is on.}}
* {{The {$device} are on.}}

German

.match {$deviceCount :number}
one {{{$device state=definite} ist eingeschaltet.}}
* {{{$device state=definite} sind eingeschaltet}}

Now, if the gender of the device matters (which it does in many languages), then the localization software would expand as follows. So there would be 4 variant sub-message that would need to be translated. In order for this expansion to occur, we'd have to supply the information that arbitrary objects can be masculine or feminine in French.

French

.match {$deviceCount :number}{$device :gender}
one feminine {{{$device state=definite} est allumée.}}
one * {{{$device state=definite} est allumé.}}
* feminine {{{$device state=definite} sont allumées.}}
* * {{{$device state=definite} sont allumés.}}

(Forgive my French.)

Now, that is if MF2 mostly follows MF1 selection. If it allows for inflection engines that can recast literal data, then this could be simplified down something like:

{$device state=definite} {|est allumée.| :agree gender=$device plural=$device}

{|est allumée.| :reset gender=$device} just means

  1. take the literal text inside of |…| — in this case "est allumee."
  2. change that literal text so that it agrees in gender and plural category with $device.

Note that this is all quite speculative. The syntax and basic functions of MF are in place, but not the extensions for grammar. So the :agree function is not at all defined; that is just an illustration of how it might work, as is :gender above.

An interesting point is that the simpler one-line message also requires more knowledge of the translator and/or translation software.


Now, the interesting (read "hard") bit is where categories in English (assuming that is the source) don't match the categories in French. For example, suppose that you have to say

Les appareils sont allumés.
Les machines sont en marche.

That gets tricky

grhoten pushed a commit that referenced this issue Oct 30, 2024
…CENSE.txt for copyright and permission details.

This contribution should resolve the following issues: #5, #6, #7, #11, #12, #13, #15, #17, #18, #19
This contribution is also related to the following issues without fully resolving the issues: 3, 4, 8, 10, 21, 23, 24, 25
This contribution also has an implementation that addresses these CLDR issues: 13025, 13563
grhoten added a commit that referenced this issue Oct 30, 2024
…CENSE.txt for copyright and permission details.

This contribution should resolve the following issues: #5, #6, #7, #11, #12, #13, #15, #17, #18, #19
This contribution is also related to the following issues without fully resolving the issues: 3, 4, 8, 10, 21, 23, 24, 25
This contribution also has an implementation that addresses these CLDR issues: 13025, 13563
grhoten added a commit that referenced this issue Nov 30, 2024
This contribution should resolve the following issues: #5, #6, #7, #11, #12, #13, #15, #17, #18, #19
This contribution is also related to the following issues without fully resolving the issues: 3, 4, 8, 10, 21, 23, 24, 25
This contribution also has an implementation that addresses these CLDR issues: 13025, 13563
nciric pushed a commit that referenced this issue Dec 10, 2024
This contribution should resolve the following issues: #5, #6, #7, #11, #12, #13, #15, #17, #18, #19
This contribution is also related to the following issues without fully resolving the issues: 3, 4, 8, 10, 21, 23, 24, 25
This contribution also has an implementation that addresses these CLDR issues: 13025, 13563
@grhoten
Copy link
Member Author

grhoten commented Dec 10, 2024

I'd like to nominate this to be resolved in pull request #35.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Discussion item
Projects
Status: In Progress
Development

No branches or pull requests

3 participants