-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fallback code for unsupported locales #168
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change findAppropriateLanguage
to findTranslation
and have this return the translation object rather than a string which is a key in that index?
Doing this would let us do something like _.merge({}, englishTranslation, foundTranslation)
and remove all the if not in the lang, look in english
checks.
We could expose the language code selected on this object as well so that someone can use this function to know if their language code is supported by OSRM text instructions.
test/index_test.js
Outdated
modifier: 'left' | ||
}, | ||
name: 'Way Name' | ||
}), 'Gire a la izquierda en Way Name'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turn left
instructions are identical between es
and es-ES
. Can we find an instruction that differs between the two locales?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, maybe it’d be better to unit test getAvailableLanguage()
directly instead of (or in addition to) testing its integration as part of compile()
.
index.js
Outdated
@@ -210,6 +211,30 @@ module.exports = function(version, _options) { | |||
} | |||
|
|||
return output; | |||
}, | |||
findAppropriateLanguage: function(language) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this method need to be exported? Also, how about calling it getAvailableLanguage()
or getBestMatchingLanguage()
?
index.js
Outdated
if (languages.supportedCodes.indexOf(language) === -1) throw new Error('language code ' + language + ' not loaded'); | ||
compile: function(originalLanguage, step, options) { | ||
if (!originalLanguage) throw new Error('No language code provided'); | ||
var language = this.findAppropriateLanguage(originalLanguage); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several other exported methods take a language parameter as well. We should be consistent; otherwise, a client using the tokenize()
method, for example, will end up with a mismash.
index.js
Outdated
@@ -210,6 +211,30 @@ module.exports = function(version, _options) { | |||
} | |||
|
|||
return output; | |||
}, | |||
findAppropriateLanguage: function(language) { | |||
if (languages.supportedCodes.indexOf(language) > -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
languages.instructions
is an object with language codes as the keys. I think it’d be a bit easier to work with that object than the supported codes array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep the list of languages we're using here consistent. An array of locales is little more accurate than a dictionary with keys IMO.
index.js
Outdated
// no country code needed. | ||
return filteredSupportedLanguages.sort(function(a, b) { | ||
return a.length - b.length; | ||
})[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should look for matches in this order (prior art), at least until we can pull in a more sophisticated library that specializes in language fallback logic:
- Same full locale code (e.g.,
lng-Scpt-CC
) - Same language code and script code (
lng-Scpt
) - Same language code and country code (
lng-CC
) - Same language code (
lng
) - Same language code and any script code (
lng-Scpx
) - Same language code and any country code (
lng-CX
) - English (
en
)
It’ll help if we use languages.instructions
instead of languages.supportedCodes
.
/cc @jcsg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that sounds good. I was originally worried that condition 5 might mean that we'd serve Serbian in Latin rather than Cyrillic, but that probably would be better than just serving English for that region, especially if the voice instructions still use Serbian.
Thanks for looping me in, @1ec5 .
We could simplify this PR considerably by removing all the calls to |
index.js
Outdated
|
||
// Same language code (lng) | ||
} else if (supportedLanguageCode.indexOf(languageCode) > -1) { | ||
return languages.supportedCodes[supportedLanguageCode.indexOf(languageCode)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This entire expression is equivalent to just languageCode
.
index.js
Outdated
return languageCode + '-' + countryCode; | ||
|
||
// Same language code (lng) | ||
} else if (supportedLanguageCode.indexOf(languageCode) > -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is guaranteed to be equivalent to if (language.instructions[languageCode])
.
index.js
Outdated
countryCode = splitLocale[2]; | ||
} | ||
|
||
var supportedLanguageCode = languages.supportedCodes.map(function(locale) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an array, so the name should be plural.
index.js
Outdated
return languages.supportedCodes[supportedLanguageCode.indexOf(languageCode)]; | ||
|
||
// Same language code and any script code (lng-Scpx) | ||
} else if (supportedLanguageCode.indexOf(languageCode) > -1 && scriptCode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we support zh-Hans
but not zh
, and language
is zh-Hant
, then we won’t go into this if
statement because zh-Hant
isn’t one of the supported languages. Instead, we need to look for an item in supportedLanguageCodes
that contains the language code zh
and also contains a script code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 167f7c2
test/index_test.js
Outdated
}); | ||
|
||
t.test('getBestMatchingLanguage', function(t) { | ||
t.assert(compiler('v5').getBestMatchingLanguage('zh-Hant'), 'zh-Hans'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let’s make sure all the cases above get proper unit tests too, especially es-MX
→ es
. It’ll also be important to make sure we match the language codes case-insensitively, since that’s what the Directions API does.
index.js
Outdated
var scriptCode = false; | ||
var countryCode = false; | ||
|
||
if (splitLocale.lenth === 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: length.
index.js
Outdated
var countryCode = false; | ||
|
||
if (splitLocale.lenth === 1) { | ||
languageCode = splitLocale[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well set languageCode
unconditionally at this point. If language
is something bogus like c++
, then we’ll end up falling back to English anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to keep the splitting code and the return code separated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There’s no else
case here. If language
is the empty string, then we wind up looking up silly values like false-false
in instructions
. I continue to think we should pull this line out of the if
and set it unconditionally.
index.js
Outdated
} else if (splitLocale.length === 3) { | ||
languageCode = splitLocale[0]; | ||
scriptCode = splitLocale[1]; | ||
countryCode = splitLocale[2]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here’s a useful reference on the syntax of a BCP47 language tag. We’re currently handling all the parts that a client is likely to pass in.
index.js
Outdated
// Same language code and any script code (lng-Scpx) and the found language contains a script | ||
} else if (supportedLanguageCodes.indexOf(languageCode) > -1 && scriptCode && | ||
languages.supportedCodes[supportedLanguageCodes.indexOf(languageCode)].split('-').length > 1 && | ||
languages.supportedCodes[supportedLanguageCodes.indexOf(languageCode)].split('-')[1].length === 4) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be easier if, right off the bat, we map each item in supportedCodes
to an object with the keys language
, script
, and region
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd only be used really for this case. I'm fine with keeping this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do the preprocessing suggested above, then this becomes:
else if (availableLanguages.find(function (language) {
return language.language === languageCode && language.script;
}))
and some of the duplication in the next else if
block goes away.
We can do that preprocessing inside languages.js to avoid any meaningful performance penalty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it around to this in 9a19e1e, not sure if it's any more clear.
index.js
Outdated
})) { | ||
return languages.supportedCodes[supportedLanguageCodes.indexOf(languageCode)]; | ||
// Same language code and any country code (lng-CX) | ||
} else if (supportedLanguageCodes.indexOf(languageCode) > -1 && countryCode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the passed-in language
is pt-PT
and we support pt-BR
(but not pt
), this line checks whether pt-PT
is supported (it isn’t) and pt-PT
has a country code (it does). Consequently, we fail to go in here. Let’s add a test for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
index.js
Outdated
}); | ||
|
||
var availableLanguages = languages.supportedCodes.map(function(language) { | ||
return that.divideLanguageCodes(language); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
languages.js currently exports a supportedCodes
containing strings; it should also export supportedLanguages
(or availableLanguages
) that maps to these objects. That way this code only has to run once, no matter how many times getBestMatchingLanguage()
is called.
index.js
Outdated
} else if (availableLanguages.find(function (language) { | ||
return language.languageCode === languageCode && language.scriptCode; | ||
})) { | ||
return languages.supportedCodes[supportedLanguageCodes.indexOf(languageCode)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The object returned by find()
should have a locale
property that contains the full string, so that we can just return the locale
property here instead of searching through supportedCodes
.
index.js
Outdated
}, | ||
divideLanguageCodes: function(language) { | ||
// If the language is not found, try a little harder | ||
var splitLocale = language.toLowerCase().split('-'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, it’d be cleaner to use a regular expression for this job:
// Two-letter language code, then optional four-letter script code, then optional two-letter region code
var match = language.match(/(\w\w)(?:-(\w\w\w\w))?(?:-(\w\w))?/i);
// Normalize case
var locale = [];
if (match[1]) {
match[1] = match[1].toLowerCase();
locale.push(match[1]);
}
if (match[2]) {
match[2] = match[2][0].toUpperCase() + match[2].substring(1).toLowerCase();
locale.push(match[2]);
}
if (match[3]) {
match[3] = match[3].toUpperCase();
locale.push(match[3]);
}
return {
locale: locale.join("-"),
language: match[1],
script: match[2],
region: match[3]
};
Looks like a lot, but most of it is to normalize the case, which is important for avoiding having to index into supportedLanguages
all the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is failing for me. I think both styles of parsing out the codes is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it’s failing, it’s probably because the rest of the code hasn’t been updated to account for the normalized case being used by ☝️.
test/index_test.js
Outdated
t.assert(compiler('v5').getBestMatchingLanguage('es-MX'), 'es'); | ||
t.assert(compiler('v5').getBestMatchingLanguage('es-es'), 'es-es'); | ||
t.end(); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let’s add some tests of divideLanguageCodes()
or whatever we call it once we move it to languages.js.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
index.js
Outdated
})) { | ||
return languages.parsedSupportedCodes.find(function (language) { | ||
return language.languageCode === languageCode && language.scriptCode; | ||
}).locale; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of repeating the find()
call, store the result of the first find. It’d be a lot easier to store things in local variables if you replace all the else if
s with if
s – after all, we’re returning early in each case.
test/index_test.js
Outdated
t.deepEqual(languages.parseLanguageIntoCodes('zh'), { countryCode: false, languageCode: 'zh', locale: 'zh', scriptCode: false }); | ||
t.deepEqual(languages.parseLanguageIntoCodes('es-MX'), { countryCode: 'mx', languageCode: 'es', locale: 'es-MX', scriptCode: false }); | ||
t.deepEqual(languages.parseLanguageIntoCodes('es-ES'), { countryCode: 'es', languageCode: 'es', locale: 'es-ES', scriptCode: false }); | ||
t.deepEqual(languages.parseLanguageIntoCodes('pt-PT'), { countryCode: 'pt', languageCode: 'pt', locale: 'pt-PT', scriptCode: false }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Directions API currently allows a language
parameter to be specified in any case, such as pt-pt
or PT-Pt
. If the Directions API is to use getBestMatchingLanguage()
, we should test that this method matches case insensitively, so pt-pt
should also produce pt-BR
.
languages.js
Outdated
locale: language, | ||
languageCode: languageCode, | ||
scriptCode: scriptCode, | ||
countryCode: countryCode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s a bit redundant to say “code” in each of these keys.
This PR adds support for trying to find locales which we currently do not support. This gist of it is, that if a developer provides
es-MX
, we should give them instructions provides byes
.This also moves to use
en
when a language is not found./cc @mcwhittemore @1ec5