Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate workspace from graph #257

Open
jerinphilip opened this issue Nov 3, 2021 · 5 comments
Open

Separate workspace from graph #257

jerinphilip opened this issue Nov 3, 2021 · 5 comments
Labels
high-priority Needs to be done at the earliest mod: bergamot Changes affecting bergamot-library mod: marian Changes affecting marian-dev component

Comments

@jerinphilip
Copy link
Contributor

jerinphilip commented Nov 3, 2021

This issue documents the progress on separating workspace from the graph for purposes of making multiple models translation more memory efficient. This is the bergamot-translator counterpart of browsermt/marian-dev#57.

Background

For #209, in order to provide Mozilla more dev time, we checked in an inefficient but API-guaranteed (no changes) implementation supporting multiple models. By supporting multiple models, we mean the extension can hold models and manage them using a model required during translation. This was necessary for outbound translation, for which simultaneously two models were needed:

  • One to translate the existing page (eg: an English speaker trying to read a French page in English).
  • One to translate the inverse direction (eg: the same English speaker trying to input data into a French page in English, translated into French before they're sent out because the French website expects input in French).

The API works for this use case without having to kill and respawn a Service by parameterizing translate(...) calls by models.

(This is, not to be confused with "pivot-translation", where two models are used back to back going from languages A -> P; P -> B to translate A -> B. Pivoting requires stronger guarantees of sentence consistency for the functionality to seamlessly work.)

Inefficiency

Due to certain design issues in the Marian primitives available to us (where Marian only ever assumes one model, or an ensemble of models of the same language direction) for time being we have workspaces per model per thread while in an ideal scenario we only have workspaces per threads (for thread safety) and models share the workspaces as required.

For purposes of this issue, a graph describes the operations on a neural network and a workspace describes the storage used to store tensors (intermediate while computation - input matrices, activations following inputs etc, hence "workspace"). In simple words, Marian's workspace sits behind a graph (ie, each graph will have a constituent workspace). If we disconnect the graph (operations) from the workspace (storage) and bind the workspace to threads/workers we will have an efficient design. The most effort here is finding how best to cut Marian's existing API to suit this case.

@jerinphilip jerinphilip added mod: marian Changes affecting marian-dev component mod: bergamot Changes affecting bergamot-library labels Nov 3, 2021
@jerinphilip
Copy link
Contributor Author

jerinphilip commented Nov 3, 2021

So far, following #210 we've been pursuing how best to approach separating graph from the workspace.

#223 is a failed attempt at storing tensors in the same graph (there are thread-safety and leak issues here).
Another ongoing effort involves inserting an injection on TensorAllocator (which is what a workspace effectively boils down to) and is available in #225 with corresponding Marian changes in browsermt/marian-dev#58.

Recent internal discussions provides the following pointers from @kpu:

  1. Scorers are light weight, it’s the memory behind them
  2. setworkspace switching out memory. Shortlist anything allocator of the graph is put into workspace.
  3. WrappedDevice might potentially be useful.
  4. Keep a scorer for each model
  5. One workspace per thread
  6. Scorers are not threadsafe cause they own a graph, but have one per model per thread
  7. Use WrappedDevice

@XapaJIaMnu suspects we might be using "Workspace" to store what are more permanent tensors which should stay with TranslationModel, like a prepared matrix for intgemm or something for shortlist.

@jerinphilip
Copy link
Contributor Author

jerinphilip commented Nov 3, 2021

@XapaJIaMnu suspects we might be using "Workspace" to store what are more permanent tensors which should stay with TranslationModel, like a prepared matrix for intgemm or something for shortlist.

The next action-item is to

  • check if this is indeed the case, by trying a marian-model without intgemm (fp32) and without shortlist and checking if the translations are intact.

The above will be accomplished building on top of #225. concurrentMultimodelsIntensive runs multiple models in parallel over threads and checks if the output is consistent with how it would be if they were run on a single-thread with no other model operating on the shared-workspace. If a second model overwrites something on the workspace that belongs to the first model this would corrupt the translations. #225 also provides additional logging through browsermt/marian-dev#58 on when tensors are cleared or created.

@jerinphilip
Copy link
Contributor Author

Adding some more information on related to this here:

Everything is intgemm interface, if you fix that you would be able to work with on-demand models (fp32 npzs). Get things to work with fp32 models without a shortlist. Next step, fp32 with a shortlist, it will require changing this spaghetti. https://github.com/browsermt/marian-dev/blob/200e81c0cc88259c540b96afc6e0867cb05570b0/src/layers/generic.cpp#L294

@XapaJIaMnu
Copy link
Collaborator

THe layers generic is not quite related to this issue (somewhat related, but not completely relevant)

@jerinphilip
Copy link
Contributor Author

jerinphilip commented Jan 31, 2022

Note to self: base models without 8-bit stuff are available for download from https://github.com/browsermt/students/blob/962d419729ff5145b2ba30be182c1e90991ca7a4/deen/download-models.sh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high-priority Needs to be done at the earliest mod: bergamot Changes affecting bergamot-library mod: marian Changes affecting marian-dev component
Projects
None yet
Development

No branches or pull requests

2 participants