Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add backends: ONNX & OpenVINO + ONNX optimization, quantization #2712

Merged
merged 22 commits into from
Oct 10, 2024

Conversation

helena-intel
Copy link
Contributor

Add OpenVINO support for SentenceTransformer models.

  • Add backend="openvino" to use OpenVINO. OpenVINO models can be loaded directly, or converted on the fly from PyTorch models on the Hugging Face hub.
  • Use an OpenVINO config with model_kwargs={"ov_config": config} where config can either be a dictionary or a path to a .json file
  • Use Intel iGPU or dGPU for inference with model_kwargs={"device": "GPU"}. (The device argument for SentenceTransformer expects a PyTorch device. It would require more code modifications with if backend checks to support using the device argument directly to enable Intel GPU. If that is preferred I'm happy to add that)

Documentation is to be done. Should I add an .rst file to docs/sentence_transformer/usage ? Here is basic documentation on how to use the OpenVINO backend, and an example of how to quantize a sentence-transformers model with NNCF and use that with sentence-transformers and the OpenVINO backend: https://gist.github.com/helena-intel/fe7ea16bc015a3d581f3a7417a35a87e

Limitations:

  • T5 models are not yet supported. optimum-intel plans to refactor seq2seq models, T5 models can be added once this refactoring is done
  • This PR only supports SentenceTransformer. CrossEncoder support could be added in a new PR.

@michaelfeil
Copy link
Contributor

michaelfeil commented Jun 9, 2024

@helena-intel

Thanks! I am not really a reviewer, just saw this PR by chance.

Two concerns:

  • OVModelForFeatureExtraction -> Doesn't this require a ONNX model, or a re-exported model?
  • How good would the abstractions you introduced hold for other providers (plain Onnx / the AWS neuron stuff / other impls?)
  • doesnt openvino ship with optium-intel? Or at least via pip install optium-intel[openvino] or similar?

@helena-intel
Copy link
Contributor Author

@michaelfeil Thanks for your comments!

OVModelForFeatureExtraction -> Doesn't this require a ONNX model, or a re-exported model?

No, it supports both PyTorch models and OpenVINO IR models. If a path to a PyTorch model is provided, it will be converted to OpenVINO IR on the fly.

How good would the abstractions you introduced hold for other providers (plain Onnx / the AWS neuron stuff / other impls?)

I added a backend parameter instead of hardcoding to OpenVINO to make it easy to add other backends too. It should be easy for all Optimum backends. There are some specifics to OpenVINO (e.g. specific configuration settings, supporting exporting on the fly) so the _load_openvino_model() method is specific for that, but the principle of loading models with Optimum is the same for all backends.

I'm also open to suggestions for a different implementation!

doesnt openvino ship with optium-intel? Or at least via pip install optium-intel[openvino] or similar?

Yes, pip install optimum[openvino] and pip install optimum-intel[openvino] install optimum-intel and all recommended dependencies for running OpenVINO models, including NNCF for model quantization and openvino-tokenizers. For running the test I added just OpenVINO is enough.

@helena-intel helena-intel force-pushed the helena/openvino-support branch from c8c8906 to 4c26bad Compare August 24, 2024 08:50
@tomaarsen
Copy link
Collaborator

tomaarsen commented Sep 30, 2024

Hello @helena-intel!

Apologies for the radio silence so far. In truth, I've been quietly experimenting with your work and expanding it a bit further. Some of my changes:

  • Add ONNX backend with the same signature as OpenVINO
  • Improve remote model support (e.g. previously we checked export just based on whether "openvino_model.xml" existed locally, now also if it exists remotely)
  • Add create_pr to push_to_hub, making it easier to make pull requests to existing models to add OpenVINO/ONNX models.
  • Implore users to model.save_pretrained and model.push_to_hub to prevent having re-export the models. I think this should help increase the number of OpenVINO models on the Hub.
  • Add helper function to optimize ONNX models via Optimum. I'm open to more optimization helper functions for OpenVINO as well.

Feel free to let me know what you think.
P.s. I'm still ironing out the last test failures, and I still have to incorporate this all in some documentation.

  • Tom Aarsen

@tomaarsen tomaarsen changed the title Add OpenVINO support Add backends: ONNX & OpenVINO + ONNX optimization, quantization Oct 10, 2024
@tomaarsen tomaarsen merged commit adbf0ba into UKPLab:master Oct 10, 2024
11 checks passed
@helena-intel
Copy link
Contributor Author

Thanks so much for all your work on this @tomaarsen ! I was on vacation for the past weeks and missed a notification. I am really excited to see this!

@tomaarsen
Copy link
Collaborator

Gladly! I'm looking forward to seeing the community adopt the new backends more, I think they'll be very valuable.

  • Tom Aarsen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants