Caching translations #201

jerinphilip · 2021-06-29T18:03:57Z

For potential use in outbound translation (#78) and https://github.com/XapaJIaMnu/translateLocally, we're implementing a translation caching mechanism.

The plan is to implement a simple LRU cache (using a map + linked list) with

Key : std::string or equivalent marian::Words
Value : History

It is agreed upon offline that the value is History. Since marian::Words are what generates history, I propose we just implement a hash for marian::Words and use this as key. With this cache is only marian primitives + the cache code.

The easiest way to get this into the existing source is during the preparation of a Request. Segments = marian::Words and we can have a stage during construction where we pre-fill the histories.

bergamot-translator/src/translator/request.cpp

Lines 19 to 29 in cb855be

    
           { 
        
             counter_ = segments_.size(); 
        
             histories_.resize(segments_.size(), nullptr); 
        
             // If there are no segments_, we are never able to trigger the responseBuilder 
        
             // calls from a different thread. However, in this case we want an empty valid 
        
             // response. 
        
             if (segments_.size() == 0) { 
        
               responseBuilder_(std::move(histories_)); 
        
             } 
        
           }

The iteration mechanism provided to convert to a batch can now skip the entries where Histories are already computed (not nullptr). The rest of the pipeline (Response construction matching the Request etc) will follow.

This means an addition of isCacheCompleted() on RequestSentence relaying back to isCacheCompleted(size_t index) on Request for use in the batch preparation below.

bergamot-translator/src/translator/batcher.cpp

Lines 47 to 54 in cb855be

    
           void Batcher::addWholeRequest(Ptr<Request> request) { 
        
             for (size_t i = 0; i < request->numSegments(); i++) { 
        
               RequestSentence sentence(i, request); 
        
               size_t bucket_id = sentence.numTokens(); 
        
               assert(bucket_id < bucket_.size()); 
        
               bucket_[bucket_id].insert(sentence); 
        
             } 
        
           }

The cache will have to be thread-safe since multiple async calls can potentially read simultaneously. A cache fetch will update recently used (write), so slapping a mutex for access is reasonable(?).

The cache will be populated on processHistory at Request:

bergamot-translator/src/translator/request.cpp

Lines 37 to 47 in cb855be

    
           void Request::processHistory(size_t index, Ptr<History> history) { 
        
             // Concurrently called by multiple workers as a history from translation is 
        
             // ready. The container storing histories is set with the value obtained. 
        
             histories_[index] = history; 
        
             // In case this is last request in, completeRequest is called, which sets the 
        
             // value of the promise. 
        
             if (--counter_ == 0) { 
        
               responseBuilder_(std::move(histories_)); 
        
             } 
        
           }

cc @kpu @XapaJIaMnu

The text was updated successfully, but these errors were encountered:

XapaJIaMnu · 2021-06-30T11:24:23Z

The easiest way to get this into the existing source is during the preparation of a Request. Segments = marian::Words and we can have a stage during construction where we pre-fill the histories.

I am not too familiar with the design of bergamot-translator, but basically the way I see it:
Current workflow:
input -> save spaces -> ssplit -> batch -> translate -> reapply spaces.

We want to have:
input->save spaces -> ssplit -> cache -> batch -> translate -> cache -> reapply spaces
In the first trip, the module cache mechanism will filter out what has not been translated yet from what has been translated and send it for translation. On return the cache module will merge its available sentences with its missing sentences and then do a cache update (in a background thread while the spaces are being re-applied). Once everything is ready call the callback.

It is agreed upon offline that the value is History. Since marian::Words are what generates history, I propose we just implement a hash for marian::Words and use this as key. With this cache is only marian primitives + the cache code.

Why do we need to hash marian words? We just need to worry about the input string (With sentence boundaries). We can assume the output is going to be the same every time?

The cache will have to be thread-safe since multiple async calls can potentially read simultaneously. A cache fetch will update recently used (write), so slapping a mutex for access is reasonable(?).

The bergamot-translator itself is not thread safe. We can have a single thread populate the cache before returning the callback.

kpu · 2021-06-30T11:41:44Z

What do you mean bergamot-translator is not thread safe? One can submit multiple translation tasks concurrently and get them back. I thought I fixed that.

There are concurrent hash table implementations out there. An ersatz version of this is bucket-level (or band-level) read-write locks instead of slapping a mutex around the whole thing.

XapaJIaMnu · 2021-06-30T12:20:38Z

Oh, didn't know it was fixed

jerinphilip · 2021-06-30T12:35:18Z

I am not too familiar with the design of bergamot-translator:

I'm bringing in the draft implementation, for a more concrete picture. Working with sentences might be possible with some reorganization. Key=marian::Words -> Value=Ptr<History> is just right enough to be able to be plucked from bergamot and the right place. The preprocessing and reconstruction with spaces are cheap to not be cached.

In the first trip, the module cache mechanism will filter out what has not been translated yet from what has been translated and send it for translation. On return the cache module will merge its available sentences with its missing sentences and then do a cache update (in a background thread while the spaces are being re-applied). Once everything is ready call the callback.

I haven't accounted for single writer at the end flush updating all histories in cache. Seems like the better thing to do. Will incorporate.

There are concurrent hash table implementations out there. An ersatz version of this is bucket-level (or band-level) read-write locks instead of slapping a mutex around the whole thing.

Any references you recommend? Is there an eviction policy that we agree on and in which case what data structure for the ordering? Also isn't updating the ordering write every time it's a cache fetch or update? I'm hoping I can fix the API so the integrations work now with the naive implementation when we replace struct LRUCache with something better.

XapaJIaMnu · 2021-07-02T12:59:14Z

Hey. We had a discussion with Jerin about the cache implementation. Here are the main points:

There are very few use cases where one would actually have two translations running concurrently and contenting for the cache. The one I can think of is when you have two web pages open. That being said, there is no reason to not use a high performance concurrent implementation.
No wheel reinventing. Use one from the internet. We can use Intel's TBB's concurrent_unordered_map and concurrent_vector https://spec.oneapi.com/versions/latest/elements/oneTBB/source/containers.html and start from there. Even better, some full fledged solution like https://github.com/microsoft/L4 (which would have to have its use of boost removed). @kpu do you have a preferred off the shelf solution to this problem?
Evictions. We don't really need a full fledged LRU I feel. We are dealing with text here and we can have something like 50MB/100MB that would never ever be filled during the lifetime of a program. And even if it is, random eviction would work fine.

My suggestion would be to try to fit Microsoft's L4 for our needs if it is easy enough to strip boost out of it.

kpu · 2021-07-26T12:59:47Z

https://en.cppreference.com/w/cpp/memory/shared_ptr/atomic2 would be nice but it's C++20.

jerinphilip · 2021-10-31T20:52:59Z

Oh, we fixed this in #227.

jerinphilip mentioned this issue Jun 30, 2021

Caching translations implementation #202

Closed

jerinphilip closed this as completed Oct 31, 2021

jerinphilip mentioned this issue Feb 10, 2022

Use translation cache mozilla/firefox-translations#96

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caching translations #201

Caching translations #201

jerinphilip commented Jun 29, 2021

XapaJIaMnu commented Jun 30, 2021

kpu commented Jun 30, 2021

XapaJIaMnu commented Jun 30, 2021

jerinphilip commented Jun 30, 2021

XapaJIaMnu commented Jul 2, 2021

kpu commented Jul 26, 2021

jerinphilip commented Oct 31, 2021

Caching translations #201

Caching translations #201

Comments

jerinphilip commented Jun 29, 2021

XapaJIaMnu commented Jun 30, 2021

kpu commented Jun 30, 2021

XapaJIaMnu commented Jun 30, 2021

jerinphilip commented Jun 30, 2021

XapaJIaMnu commented Jul 2, 2021

kpu commented Jul 26, 2021

jerinphilip commented Oct 31, 2021