Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for translations over 5000 characters #20

Closed
rawr51919 opened this issue Mar 21, 2019 · 3 comments
Closed

Add support for translations over 5000 characters #20

rawr51919 opened this issue Mar 21, 2019 · 3 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@rawr51919
Copy link

rawr51919 commented Mar 21, 2019

Is it possible for us to implement something similar to what https://github.com/Localize/node-google-translate / https://www.npmjs.com/package/google-translate does in order to handle translations over 5000 characters? That's the only thing that API has over this one.

@vitalets
Copy link
Owner

vitalets commented Apr 1, 2019

I think it should be out of the library itself.
The straightforward way is to split large text into chunks by 5k symbols manually:

let text = 'large text...'.split('');  // >5k chars
const chunks = [];
while (text.length) chunks.push(text.splice(0, 5000).join(''));

Promise.all(chunks.map(chunk => translate(chunk)));

More advanced approach is to cut text only by sentence endings. I assume there should be a library for that. Anyway I suggest to keep library scope minimal.

@vitalets vitalets added help wanted Extra attention is needed enhancement New feature or request labels Apr 1, 2019
@rawr51919
Copy link
Author

rawr51919 commented Apr 1, 2019

I think it should be out of the library itself.
The straightforward way is to split large text into chunks by 5k symbols manually:

let text = 'large text...'.split('');  // >5k chars
const chunks = [];
while (text.length) chunks.push(text.splice(0, 5000).join(''));

Promise.all(chunks.map(chunk => translate(chunk)));

More advanced approach is to cut text only by sentence endings. I assume there should be a library for that. Anyway I suggest to keep library scope minimal.

So you're saying that this is fixable by way of whatever you attempt to use for it? I believe the repo I had the idea from uses this code for that:

// Split into multiple calls if string array is longer than allowed by Google (5k for POST)
    var stringSets;
    if (shouldSplitSegments(strings)) {
      stringSets = [];
      splitArraysForGoogle(strings, stringSets);
    } else if (!Array.isArray(strings)) {
      stringSets = [[strings]];
    } else {
      stringSets = [strings];
    }

    // Request options
    var data = { target: targetLang };
    if (sourceLang) data.source = sourceLang;

    // Run queries async
    async.mapLimit(stringSets, concurrentLimit, function(stringSet, done) {

      post('', _.extend({ q: stringSet }, data), parseTranslations(stringSet, done));

    }, function(err, translations) {
      if (err) return done(err);

      // Merge and return translation
      translations = _.flatten(translations);
      if (translations.length === 1) translations = translations[0];
      done(null, translations);
    });

And SplitArraysForGoogle from said code:

// Return array of arrays that are short enough for Google to handle
var splitArraysForGoogle = function(arr, result) {
  if (arr.length > maxSegments || (encodeURIComponent(arr.join(',')).length > maxGetQueryLen && arr.length !== 1)) {
    var mid = Math.floor(arr.length / 2);
    splitArraysForGoogle(arr.slice(0, mid), result);
    splitArraysForGoogle(arr.slice(mid, arr.length), result);
  } else {
    result.push(arr);
  }
};

This approach would also work without the concurrent translation limit, but if we ever go through with it, the limit could be imposed to help prevent Google Translate server flooding.

@vitalets
Copy link
Owner

Yes, I mean this is fixable by application level code. Not inside the library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants