-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add backends: ONNX & OpenVINO + ONNX optimization, quantization #2712
Add backends: ONNX & OpenVINO + ONNX optimization, quantization #2712
Conversation
Thanks! I am not really a reviewer, just saw this PR by chance. Two concerns:
|
@michaelfeil Thanks for your comments!
No, it supports both PyTorch models and OpenVINO IR models. If a path to a PyTorch model is provided, it will be converted to OpenVINO IR on the fly.
I added a I'm also open to suggestions for a different implementation!
Yes, |
c8c8906
to
4c26bad
Compare
Also update push_to_hub
Incompatible with OpenVINO imports; still works independently
Hello @helena-intel! Apologies for the radio silence so far. In truth, I've been quietly experimenting with your work and expanding it a bit further. Some of my changes:
Feel free to let me know what you think.
|
Relies on the upcoming optimum and optimum-intel versions; this is expected to fail until then.
Thanks so much for all your work on this @tomaarsen ! I was on vacation for the past weeks and missed a notification. I am really excited to see this! |
Gladly! I'm looking forward to seeing the community adopt the new backends more, I think they'll be very valuable.
|
Add OpenVINO support for SentenceTransformer models.
backend="openvino"
to use OpenVINO. OpenVINO models can be loaded directly, or converted on the fly from PyTorch models on the Hugging Face hub.model_kwargs={"ov_config": config}
where config can either be a dictionary or a path to a .json filemodel_kwargs={"device": "GPU"}
. (Thedevice
argument forSentenceTransformer
expects a PyTorch device. It would require more code modifications withif backend
checks to support using thedevice
argument directly to enable Intel GPU. If that is preferred I'm happy to add that)Documentation is to be done. Should I add an .rst file to docs/sentence_transformer/usage ? Here is basic documentation on how to use the OpenVINO backend, and an example of how to quantize a sentence-transformers model with NNCF and use that with sentence-transformers and the OpenVINO backend: https://gist.github.com/helena-intel/fe7ea16bc015a3d581f3a7417a35a87e
Limitations: