Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching translations #201

Closed
jerinphilip opened this issue Jun 29, 2021 · 7 comments
Closed

Caching translations #201

jerinphilip opened this issue Jun 29, 2021 · 7 comments

Comments

@jerinphilip
Copy link
Contributor

For potential use in outbound translation (#78) and https://github.com/XapaJIaMnu/translateLocally, we're implementing a translation caching mechanism.

The plan is to implement a simple LRU cache (using a map + linked list) with

  • Key : std::string or equivalent marian::Words
  • Value : History

It is agreed upon offline that the value is History. Since marian::Words are what generates history, I propose we just implement a hash for marian::Words and use this as key. With this cache is only marian primitives + the cache code.

The easiest way to get this into the existing source is during the preparation of a Request. Segments = marian::Words and we can have a stage during construction where we pre-fill the histories.

{
counter_ = segments_.size();
histories_.resize(segments_.size(), nullptr);
// If there are no segments_, we are never able to trigger the responseBuilder
// calls from a different thread. However, in this case we want an empty valid
// response.
if (segments_.size() == 0) {
responseBuilder_(std::move(histories_));
}
}

The iteration mechanism provided to convert to a batch can now skip the entries where Histories are already computed (not nullptr). The rest of the pipeline (Response construction matching the Request etc) will follow.

This means an addition of isCacheCompleted() on RequestSentence relaying back to isCacheCompleted(size_t index) on Request for use in the batch preparation below.

void Batcher::addWholeRequest(Ptr<Request> request) {
for (size_t i = 0; i < request->numSegments(); i++) {
RequestSentence sentence(i, request);
size_t bucket_id = sentence.numTokens();
assert(bucket_id < bucket_.size());
bucket_[bucket_id].insert(sentence);
}
}

The cache will have to be thread-safe since multiple async calls can potentially read simultaneously. A cache fetch will update recently used (write), so slapping a mutex for access is reasonable(?).

The cache will be populated on processHistory at Request:

void Request::processHistory(size_t index, Ptr<History> history) {
// Concurrently called by multiple workers as a history from translation is
// ready. The container storing histories is set with the value obtained.
histories_[index] = history;
// In case this is last request in, completeRequest is called, which sets the
// value of the promise.
if (--counter_ == 0) {
responseBuilder_(std::move(histories_));
}
}

cc @kpu @XapaJIaMnu

@XapaJIaMnu
Copy link
Collaborator

The easiest way to get this into the existing source is during the preparation of a Request. Segments = marian::Words and we can have a stage during construction where we pre-fill the histories.

I am not too familiar with the design of bergamot-translator, but basically the way I see it:
Current workflow:
input -> save spaces -> ssplit -> batch -> translate -> reapply spaces.

We want to have:
input->save spaces -> ssplit -> cache -> batch -> translate -> cache -> reapply spaces
In the first trip, the module cache mechanism will filter out what has not been translated yet from what has been translated and send it for translation. On return the cache module will merge its available sentences with its missing sentences and then do a cache update (in a background thread while the spaces are being re-applied). Once everything is ready call the callback.

It is agreed upon offline that the value is History. Since marian::Words are what generates history, I propose we just implement a hash for marian::Words and use this as key. With this cache is only marian primitives + the cache code.

Why do we need to hash marian words? We just need to worry about the input string (With sentence boundaries). We can assume the output is going to be the same every time?

The cache will have to be thread-safe since multiple async calls can potentially read simultaneously. A cache fetch will update recently used (write), so slapping a mutex for access is reasonable(?).

The bergamot-translator itself is not thread safe. We can have a single thread populate the cache before returning the callback.

@kpu
Copy link
Member

kpu commented Jun 30, 2021

What do you mean bergamot-translator is not thread safe? One can submit multiple translation tasks concurrently and get them back. I thought I fixed that.

There are concurrent hash table implementations out there. An ersatz version of this is bucket-level (or band-level) read-write locks instead of slapping a mutex around the whole thing.

@XapaJIaMnu
Copy link
Collaborator

Oh, didn't know it was fixed

@jerinphilip
Copy link
Contributor Author

I am not too familiar with the design of bergamot-translator:

I'm bringing in the draft implementation, for a more concrete picture. Working with sentences might be possible with some reorganization. Key=marian::Words -> Value=Ptr<History> is just right enough to be able to be plucked from bergamot and the right place. The preprocessing and reconstruction with spaces are cheap to not be cached.

In the first trip, the module cache mechanism will filter out what has not been translated yet from what has been translated and send it for translation. On return the cache module will merge its available sentences with its missing sentences and then do a cache update (in a background thread while the spaces are being re-applied). Once everything is ready call the callback.

I haven't accounted for single writer at the end flush updating all histories in cache. Seems like the better thing to do. Will incorporate.

There are concurrent hash table implementations out there. An ersatz version of this is bucket-level (or band-level) read-write locks instead of slapping a mutex around the whole thing.

Any references you recommend? Is there an eviction policy that we agree on and in which case what data structure for the ordering? Also isn't updating the ordering write every time it's a cache fetch or update? I'm hoping I can fix the API so the integrations work now with the naive implementation when we replace struct LRUCache with something better.

@XapaJIaMnu
Copy link
Collaborator

Hey. We had a discussion with Jerin about the cache implementation. Here are the main points:

  1. There are very few use cases where one would actually have two translations running concurrently and contenting for the cache. The one I can think of is when you have two web pages open. That being said, there is no reason to not use a high performance concurrent implementation.

  2. No wheel reinventing. Use one from the internet. We can use Intel's TBB's concurrent_unordered_map and concurrent_vector https://spec.oneapi.com/versions/latest/elements/oneTBB/source/containers.html and start from there. Even better, some full fledged solution like https://github.com/microsoft/L4 (which would have to have its use of boost removed). @kpu do you have a preferred off the shelf solution to this problem?

  3. Evictions. We don't really need a full fledged LRU I feel. We are dealing with text here and we can have something like 50MB/100MB that would never ever be filled during the lifetime of a program. And even if it is, random eviction would work fine.

My suggestion would be to try to fit Microsoft's L4 for our needs if it is easy enough to strip boost out of it.

@kpu
Copy link
Member

kpu commented Jul 26, 2021

https://en.cppreference.com/w/cpp/memory/shared_ptr/atomic2 would be nice but it's C++20.

@jerinphilip
Copy link
Contributor Author

Oh, we fixed this in #227.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants