-
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support making words definite, indefinite or construct #6
Comments
Are there any specific messages that could benefit from such mechanism? |
I recommend reviewing this UTW video called Automatic Grammar Agreement in Message Formatting. In languages that frequently gender their nouns, the definite and indefinite article varies a lot, and it depends on the grammatical properties of the noun or adjective adjacent to the article. So if I ever want to say "The ${device} is on", knowing how to put the definite article in front of the device is very important, especially when the vocabulary of "device" is significantly large or provided by the user. For a language like Swedish, you don't add an article, you inflect the word. |
does it mean that the person who authors the message in the first place would need to write: |
I suspect that in English it's an elision for "turned on", a preposition in
a phrasal verb.
…On Thu, Mar 14, 2024, 01:33 BrunoCartoni ***@***.***> wrote:
does it mean that the person who authors the message in the first place
would need to write:
" ${the device} is on" ?
—
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMEXY35Z4IXSPSWOXSDYYFOHBAVCNFSM6AAAAABELUOSC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJWHA2TKOJRHE>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Sorry for the confusion. Here's 2 ways that this can be supported that I'm currently aware of for addressing this specific topic. I'm using Spanish in my example.
The "%@" in the first example is the variable name with Markdown and JSON syntax. The "device" in the second example is the variable name with UEL syntax. I don't have a proposed solution for Unicode's MFWG syntax, and I think that should be a separate topic of mapping the concept into syntax. |
In MF2 #2 would be something like:
{$device definiteness=definite}
…On Thu, Mar 14, 2024, 10:55 George Rhoten ***@***.***> wrote:
does it mean that the person who authors the message in the first place
would need to write: " ${the device} is on" ?
Sorry for the confusion. Here's 2 ways that this can be supported that I'm
currently aware of for addressing this specific topic. I'm using Spanish in
my example.
1. ^[El %@](inflect: true)
2. ${device.definite}
The "%@" in the first example is the variable name with Markdown and JSON
syntax. The "device" in the second example is the variable name with UEL
<https://en.wikipedia.org/wiki/Unified_Expression_Language> syntax. I
don't have a proposed solution for Unicode's MFWG syntax, and I think that
should be a separate topic of mapping the concept into syntax.
—
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMCM75WP6GDZRJACVCLYYHQAPAVCNFSM6AAAAABELUOSC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJYGAZDAOJUGU>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Thanks for the clarification!
Probably a bit off-topics, but how can we ensure that message authors (i.e.
probably developers) use the correct syntax?
…On Thu, Mar 14, 2024 at 8:39 PM Mark Davis ***@***.***> wrote:
In MF2 #2 would be something like:
{$device definiteness=definite}
On Thu, Mar 14, 2024, 10:55 George Rhoten ***@***.***> wrote:
> does it mean that the person who authors the message in the first place
> would need to write: " ${the device} is on" ?
>
> Sorry for the confusion. Here's 2 ways that this can be supported that
I'm
> currently aware of for addressing this specific topic. I'm using Spanish
in
> my example.
>
> 1. ^[El %@](inflect: true)
> 2. ${device.definite}
>
> The "%@" in the first example is the variable name with Markdown and
JSON
> syntax. The "device" in the second example is the variable name with UEL
> <https://en.wikipedia.org/wiki/Unified_Expression_Language> syntax. I
> don't have a proposed solution for Unicode's MFWG syntax, and I think
that
> should be a separate topic of mapping the concept into syntax.
>
> —
> Reply to this email directly, view it on GitHub
> <
#6 (comment)>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/ACJLEMCM75WP6GDZRJACVCLYYHQAPAVCNFSM6AAAAABELUOSC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJYGAZDAOJUGU>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
—
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BGM2AFBKTUEUVLWN4MU67IDYYH4HHAVCNFSM6AAAAABELUOSC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJYGI4TGNZSGA>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Bruno Cartoni | (he/him) | Staff Linguist | Pride at Google Zürich Lead |
***@***.*** | +41.79.246.80.46
|
MF 2 and other systems will detect that the syntax is incorrect, ie, #1 and
#2 are disallowed in MF 2
…On Fri, Mar 15, 2024, 07:26 BrunoCartoni ***@***.***> wrote:
Thanks for the clarification!
Probably a bit off-topics, but how can we ensure that message authors
(i.e.
probably developers) use the correct syntax?
On Thu, Mar 14, 2024 at 8:39 PM Mark Davis ***@***.***> wrote:
> In MF2 #2 would be something like:
>
> {$device definiteness=definite}
>
>
> On Thu, Mar 14, 2024, 10:55 George Rhoten ***@***.***> wrote:
>
> > does it mean that the person who authors the message in the first
place
> > would need to write: " ${the device} is on" ?
> >
> > Sorry for the confusion. Here's 2 ways that this can be supported that
> I'm
> > currently aware of for addressing this specific topic. I'm using
Spanish
> in
> > my example.
> >
> > 1. ^[El %@](inflect: true)
> > 2. ${device.definite}
> >
> > The "%@" in the first example is the variable name with Markdown and
> JSON
> > syntax. The "device" in the second example is the variable name with
UEL
> > <https://en.wikipedia.org/wiki/Unified_Expression_Language> syntax. I
> > don't have a proposed solution for Unicode's MFWG syntax, and I think
> that
> > should be a separate topic of mapping the concept into syntax.
> >
> > —
> > Reply to this email directly, view it on GitHub
> > <
>
#6 (comment)>,
>
> > or unsubscribe
> > <
>
https://github.com/notifications/unsubscribe-auth/ACJLEMCM75WP6GDZRJACVCLYYHQAPAVCNFSM6AAAAABELUOSC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJYGAZDAOJUGU>
>
> > .
> > You are receiving this because you commented.Message ID:
> > ***@***.***>
> >
>
> —
> Reply to this email directly, view it on GitHub
> <
#6 (comment)>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/BGM2AFBKTUEUVLWN4MU67IDYYH4HHAVCNFSM6AAAAABELUOSC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJYGI4TGNZSGA>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
--
Bruno Cartoni | (he/him) | Staff Linguist | Pride at Google Zürich Lead |
***@***.*** | +41.79.246.80.46
—
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEME3VBXAYTTUYWFVTFDYYMALBAVCNFSM6AAAAABELUOSC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJZG44DGNRRG4>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Just to be sure: if the message author writes " the ${device} is on"
(instead of " ${device, state=definiteness} is on"), is there a way to
detect that the message is ill-formed?
…On Fri, Mar 15, 2024 at 3:46 PM Mark Davis ***@***.***> wrote:
MF 2 and other systems will detect that the syntax is incorrect, ie, #1
and
#2 are disallowed in MF 2
On Fri, Mar 15, 2024, 07:26 BrunoCartoni ***@***.***> wrote:
> Thanks for the clarification!
>
> Probably a bit off-topics, but how can we ensure that message authors
> (i.e.
> probably developers) use the correct syntax?
>
>
>
> On Thu, Mar 14, 2024 at 8:39 PM Mark Davis ***@***.***> wrote:
>
> > In MF2 #2 would be something like:
> >
> > {$device definiteness=definite}
> >
> >
> > On Thu, Mar 14, 2024, 10:55 George Rhoten ***@***.***> wrote:
> >
> > > does it mean that the person who authors the message in the first
> place
> > > would need to write: " ${the device} is on" ?
> > >
> > > Sorry for the confusion. Here's 2 ways that this can be supported
that
> > I'm
> > > currently aware of for addressing this specific topic. I'm using
> Spanish
> > in
> > > my example.
> > >
> > > 1. ^[El %@](inflect: true)
> > > 2. ${device.definite}
> > >
> > > The "%@" in the first example is the variable name with Markdown and
> > JSON
> > > syntax. The "device" in the second example is the variable name with
> UEL
> > > <https://en.wikipedia.org/wiki/Unified_Expression_Language> syntax.
I
> > > don't have a proposed solution for Unicode's MFWG syntax, and I
think
> > that
> > > should be a separate topic of mapping the concept into syntax.
> > >
> > > —
> > > Reply to this email directly, view it on GitHub
> > > <
> >
>
#6 (comment)>,
>
> >
> > > or unsubscribe
> > > <
> >
>
https://github.com/notifications/unsubscribe-auth/ACJLEMCM75WP6GDZRJACVCLYYHQAPAVCNFSM6AAAAABELUOSC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJYGAZDAOJUGU>
>
> >
> > > .
> > > You are receiving this because you commented.Message ID:
> > > ***@***.***>
> > >
> >
> > —
> > Reply to this email directly, view it on GitHub
> > <
>
#6 (comment)>,
>
> > or unsubscribe
> > <
>
https://github.com/notifications/unsubscribe-auth/BGM2AFBKTUEUVLWN4MU67IDYYH4HHAVCNFSM6AAAAABELUOSC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJYGI4TGNZSGA>
>
> > .
> > You are receiving this because you commented.Message ID:
> > ***@***.***>
> >
>
>
> --
>
> Bruno Cartoni | (he/him) | Staff Linguist | Pride at Google Zürich Lead
|
> ***@***.*** | +41.79.246.80.46 <+41%2079%20246%2080%2046>
>
> —
> Reply to this email directly, view it on GitHub
> <
#6 (comment)>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/ACJLEME3VBXAYTTUYWFVTFDYYMALBAVCNFSM6AAAAABELUOSC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJZG44DGNRRG4>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
—
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BGM2AFB33ECM6RGNSAO4Q63YYNFZPAVCNFSM6AAAAABELUOSC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBQGM2DQOJSGY>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Bruno Cartoni | (he/him) | Staff Linguist | Pride at Google Zürich Lead |
***@***.*** | +41.79.246.80.46
|
Can you clarify what you mean?
|
Sorry for not being clear!
My question comes from some discussion with translators at Google. They
often complain that if the original message is not formatted the right way,
and they cannot change it (maybe this is specific to Google, not sure).
So if they receive a message like:
English: "Welcome"
they cannot produce something like {female {Benvenida}, male {Benvenido},
etc...).
But maybe this is just a limitation on their side, and we should assume
that translators can always modify the syntax?
…On Mon, Mar 18, 2024 at 9:55 AM George Rhoten ***@***.***> wrote:
is there a way to detect that the message is ill-formed?
Can you clarify what you mean?
1. If you mean if the state=definiteness was not used for a language
that would benefit from using such syntax, I don't think that's within the
scope here. That seems like a lint/static analyzer topic. I would prefer to
talk about how to make it possible instead of worrying about how authors
are not benefiting from such functionality. I don't consider it ill-formed
in such a situation. It's maybe worth of a warning in the message
formatting framework.
2. If you mean that the device variable is already definite, say it
was named "The light". That's easy to detect and leave it as is instead of
turning it into "The The light".
3. If you mean that the device variable is already definite through
other styles, say it was named "My light", and you wanted to change it to
"Your light", or you didn't want to turn it into "The My light", that's a
harder topic. In that case, it not about it being ill-formed. It's about
grammatical correctness. I'm fine with being aware of such situations. At a
certain point, I'd rather defer handling more complex messages to a future
date. I just want to handle the simple example in this issue.
—
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BGM2AFAVRBLZ7CRMZC5NS4DYY3W4JAVCNFSM6AAAAABELUOSC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBTHE4DGMJSGU>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Bruno Cartoni | (he/him) | Staff Linguist | Pride at Google Zürich Lead |
***@***.*** | +41.79.246.80.46 <+41%2079%20246%2080%2046>
|
That’s a valid issue, but I consider that to be a message format framework specific issue outside the concept of inflection. I don’t believe that the MF2 framework has a connection to the pronoun information nor the concept for it. The 3 other frameworks that I’ve been involved with have varying degrees of automatic access to the pronoun information. 1 requires the message author to adopt the framework extension, which can cause a translator communication issue. 2 can usually inflect anything without developer intervention, but various levels of mistakes by developers can still happen. I’d prefer the inflection engine to be separate from the message format syntax, and I’d prefer to separate out message format adoption issues separate from this topic of just adding the ability to add a specific type of definiteness to a word or concept. |
We want the inflection information to work for multiple clients, including but of course not limited to MF2.0 Going back to Bruno's question about:
MF2.0 is still in development, especially the inflection bits, so caveat lector. Say the English is:
In general, it is the localization software that allows translators access to the message. I think the thinking is that certain option values will be translatable (like definiteness and case), so that for translating into German, the translator could replace that message pattern by something like the following. Delete the redundant text, and add the state option.
That would thus handle:
Now, if $device could be plural, the normal mechanism would be the following. Remember, the translator will not see the syntax; it should be presented in a much friendlier way. English
German
Now, if the gender of the device matters (which it does in many languages), then the localization software would expand as follows. So there would be 4 variant sub-message that would need to be translated. In order for this expansion to occur, we'd have to supply the information that arbitrary objects can be masculine or feminine in French. French
(Forgive my French.) Now, that is if MF2 mostly follows MF1 selection. If it allows for inflection engines that can recast literal data, then this could be simplified down something like:
{|est allumée.| :reset gender=$device} just means
Note that this is all quite speculative. The syntax and basic functions of MF are in place, but not the extensions for grammar. So the :agree function is not at all defined; that is just an illustration of how it might work, as is :gender above. An interesting point is that the simpler one-line message also requires more knowledge of the translator and/or translation software. Now, the interesting (read "hard") bit is where categories in English (assuming that is the source) don't match the categories in French. For example, suppose that you have to say
That gets tricky |
…CENSE.txt for copyright and permission details. This contribution should resolve the following issues: #5, #6, #7, #11, #12, #13, #15, #17, #18, #19 This contribution is also related to the following issues without fully resolving the issues: 3, 4, 8, 10, 21, 23, 24, 25 This contribution also has an implementation that addresses these CLDR issues: 13025, 13563
…CENSE.txt for copyright and permission details. This contribution should resolve the following issues: #5, #6, #7, #11, #12, #13, #15, #17, #18, #19 This contribution is also related to the following issues without fully resolving the issues: 3, 4, 8, 10, 21, 23, 24, 25 This contribution also has an implementation that addresses these CLDR issues: 13025, 13563
This contribution should resolve the following issues: #5, #6, #7, #11, #12, #13, #15, #17, #18, #19 This contribution is also related to the following issues without fully resolving the issues: 3, 4, 8, 10, 21, 23, 24, 25 This contribution also has an implementation that addresses these CLDR issues: 13025, 13563
This contribution should resolve the following issues: #5, #6, #7, #11, #12, #13, #15, #17, #18, #19 This contribution is also related to the following issues without fully resolving the issues: 3, 4, 8, 10, 21, 23, 24, 25 This contribution also has an implementation that addresses these CLDR issues: 13025, 13563
I'd like to nominate this to be resolved in pull request #35. |
Support should be added to make a word definite, indefinite or construct. The construct form is a discussion point for Semitic languages like Hebrew or Arabic.
Here are some examples:
English
Spanish
French
Swedish
The text was updated successfully, but these errors were encountered: