Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language maybe need add country codes(countries and their subdivisions) #80

Closed
Yunin opened this issue Dec 2, 2020 · 1 comment
Closed

Comments

@Yunin
Copy link

Yunin commented Dec 2, 2020

Hello!
When we get language detection result, it only contains main langauge information but not contain some special region language information . If we can add more information ,just like ISO_3166-1, will be much better. For example, “豪华套间” and "豪華套間" both are chinese and with same sense. “豪华套间” is belong to simple chinese(China Mainland, ISO_3166 code is CN), "豪華套間" is belong to tradional chinese(HongKong, ISO_3166 code is HK).

enum class LanguageWithArea (
    val isoCode3166_1: String,
    val language:Language
    ){
... ...
}
@pemistahl
Copy link
Owner

@Yunin I haven't differentiated between Simplified Chinese and Traditional Chinese so far. The reason is that I could not find proper text corpora written in only a single of the two variants. That's why I used a mixed corpus instead and only added CHINESE as a language without any more differentiation.

I might work on this in the future but I cannot tell you when exactly as of yet. That's why I will close this issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants