Language maybe need add country codes(countries and their subdivisions) #80

Yunin · 2020-12-02T04:13:07Z

Hello!
When we get language detection result, it only contains main langauge information but not contain some special region language information . If we can add more information ,just like ISO_3166-1, will be much better. For example, “豪华套间” and "豪華套間" both are chinese and with same sense. “豪华套间” is belong to simple chinese(China Mainland, ISO_3166 code is CN), "豪華套間" is belong to tradional chinese(HongKong, ISO_3166 code is HK).

enum class LanguageWithArea (
    val isoCode3166_1: String,
    val language:Language
    ){
... ...
}

The text was updated successfully, but these errors were encountered:

pemistahl · 2020-12-09T10:55:01Z

@Yunin I haven't differentiated between Simplified Chinese and Traditional Chinese so far. The reason is that I could not find proper text corpora written in only a single of the two variants. That's why I used a mixed corpus instead and only added CHINESE as a language without any more differentiation.

I might work on this in the future but I cannot tell you when exactly as of yet. That's why I will close this issue for now.

pemistahl closed this as completed Dec 9, 2020

reececomo mentioned this issue Dec 18, 2023

Simplified & Traditional Chinese #192

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Language maybe need add country codes(countries and their subdivisions) #80

Language maybe need add country codes(countries and their subdivisions) #80

Yunin commented Dec 2, 2020

pemistahl commented Dec 9, 2020

Language maybe need add country codes(countries and their subdivisions) #80

Language maybe need add country codes(countries and their subdivisions) #80

Comments

Yunin commented Dec 2, 2020

pemistahl commented Dec 9, 2020