-
-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Localize "shortcodes.ts" #53
Comments
I've thought about it and it's a bit complicated at the moment, since there can be so many permutations, especially with i18n. #49 |
Well, at least there are only so many i18n choices to pick from, and if a user sets the consumer software to Spanish, that software can decide to provide Spanish short codes. element-hq/element-web#49 is on another level entirely, in my opinion... as it stands, multiple pieces of software currently seem to expect that users are going to learn their specific terminology. They can probably be helped along the learning curve by providing suggestions based on what they type, while being shown emojibase's "cannonical" interpretation. But that, of course, is up for the developer to decide. After looking at https://github.com/milesj/emojibase/blob/master/packages/data/fr/raw.json, I see that English shortcodes are present, where the en/raw.json could be used as a fallback. Moreover, some tags only appear once, and could be considered shortcodes, as such (Unless I am misunderstanding something)? Example: "blasé" for unamused 😒 With the current format, the first array index could be considered the canonical emojibase shortcode, I guess. I'm a bit confused where the data comes from, and wwho decides what. If I were to make a PR to clean up shortcodes for a language, would that be acceptable, for instance? |
@MayeulC Shortcodes are hardcoded (https://github.com/milesj/emojibase/blob/master/packages/generator/src/resources/shortcodes.ts) and are not derived from the raw annotations/tags. That's what makes them a bit hard to maintain. We'd need to figure out a strategy to properly support shortcodes for all locales, instead of just 1 locale. |
Have you considered basing localizations off of Unicode CLDR? |
@strixaluco All of the annotations/labels are based on that data. I chose not to use them for shortcodes since they're... not really short, most of them are super long. |
Does seeing only English labels/annotations no matter the chosen language mean that I should report issue to downstream project? |
@strixaluco Not really, nothing much they can do about it. It's just something I'd need to figure out and there isn't really a best option at the moment. |
@milesj I'm sorry for bugging you with this again, just wanted to clear the things out. Could you please tell if following statements are correct?
|
@strixaluco 1 is correct, 2 is slightly off. The The Now the following questions arise:
|
Thank you for the clarification. Re 2. Re 1.
Further thoughts: As was discussed in element-hq/element-web#49, lack of standardization is the main problem here and in my opinion the best strategy would be to stick to Unicode as close as possible. Upon reading UTS vector-im/element-web#51 in particular and unicode.org resources in general, it seems that terminology use is slightly confusing there, but I assume that the label scheme is following: names:
annotations:
Per UTS vector-im/element-web#51:
Unfortunately, I couldn't find other document where TTS names would be inside its scope, but it seems they might be shorter than CLDR names and that would fit the aim of this discussion. Obviously bringing changes to Unicode standards will require lots of collaborative efforts, discussions and time, but it can be a future solution. |
@strixaluco All good points. This is my current thought process on how to solve this. 1 - Generate shortcodes for each locale using the localized annotation. This would create a file like so: 2 - Create shortcode presets based off popular platforms, and move the hard-coded emojibase shortcodes into a preset.
3 - Mark the current hard-coded shortcodes as "legacy" and create a new emojibase preset that more aligns with the unicode name, instead of an emotion. 4 - Update APIs to stitch multiple shortcode presets together into a single dataset. This allows consumers to use emojibase + slack + localized shortcodes for example. fetchFromCDN('de/data.json', 'latest', { shortcodes: ['emojibase', 'slack', 'locale'] });
flattenEmojiData(data, [emojibaseCodes, slackCodes, localeCodes]); 1, 3, and 4 are rather easy. Could probably knock those out in a day. 2 is the complicated one, as I'm not sure where to fetch those platform specific shortcodes from. The final open question is whether the presets (in 2) should also be localized? I'm leaning yes, which is where crowdsourcing might come into play. |
Super! From a "non-English user of Element" perspective, I'd imagine 1. is already a win and even more than that if |
This will be resolved in the next major. Will publish a pre-release and do some testing. |
Thanks a lot for looking into this 🎉 If I may add something regarding crowdsourcing, I feel it is quite important to provide native speakers with an avenue to improve their shortcodes (be it new coinage, synonyms, fixing typos) in a place that is easy enough to find :) Thanks a lot again, I'll recommend this project around when translated shortcodes are asked for! |
@KovalevArtem Correct! For "shortcodes", they don't support spaces, so underscores are used instead. |
It is somewhat awkward to type an english description of an emoji to enter it, or know its "meaning" when typing in another language. Moreover, people not proficient at English are unduly penalized when using otherwise-translated pieces of software that leverage this database (I'm thinking of riot.im at least). People are not going to teach English to their relatives (grandparents, etc) because those would like to type 🌛 or something similar.
I guess the easiest way would be to provide more shortcodes.ts files, but crowd-sourcing could also be handled trough weblate (ideally providing the emoji as context).
See: element-hq/element-web#11013 and possibly https://github.com/vector-im/riot-web/issues/9298
The text was updated successfully, but these errors were encountered: