-
-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for translations over 5000 characters #20
Comments
I think it should be out of the library itself. let text = 'large text...'.split(''); // >5k chars
const chunks = [];
while (text.length) chunks.push(text.splice(0, 5000).join(''));
Promise.all(chunks.map(chunk => translate(chunk))); More advanced approach is to cut text only by sentence endings. I assume there should be a library for that. Anyway I suggest to keep library scope minimal. |
So you're saying that this is fixable by way of whatever you attempt to use for it? I believe the repo I had the idea from uses this code for that: // Split into multiple calls if string array is longer than allowed by Google (5k for POST)
var stringSets;
if (shouldSplitSegments(strings)) {
stringSets = [];
splitArraysForGoogle(strings, stringSets);
} else if (!Array.isArray(strings)) {
stringSets = [[strings]];
} else {
stringSets = [strings];
}
// Request options
var data = { target: targetLang };
if (sourceLang) data.source = sourceLang;
// Run queries async
async.mapLimit(stringSets, concurrentLimit, function(stringSet, done) {
post('', _.extend({ q: stringSet }, data), parseTranslations(stringSet, done));
}, function(err, translations) {
if (err) return done(err);
// Merge and return translation
translations = _.flatten(translations);
if (translations.length === 1) translations = translations[0];
done(null, translations);
}); And // Return array of arrays that are short enough for Google to handle
var splitArraysForGoogle = function(arr, result) {
if (arr.length > maxSegments || (encodeURIComponent(arr.join(',')).length > maxGetQueryLen && arr.length !== 1)) {
var mid = Math.floor(arr.length / 2);
splitArraysForGoogle(arr.slice(0, mid), result);
splitArraysForGoogle(arr.slice(mid, arr.length), result);
} else {
result.push(arr);
}
}; This approach would also work without the concurrent translation limit, but if we ever go through with it, the limit could be imposed to help prevent Google Translate server flooding. |
Yes, I mean this is fixable by application level code. Not inside the library. |
Is it possible for us to implement something similar to what https://github.com/Localize/node-google-translate / https://www.npmjs.com/package/google-translate does in order to handle translations over 5000 characters? That's the only thing that API has over this one.
The text was updated successfully, but these errors were encountered: