Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add backends: ONNX & OpenVINO + ONNX optimization, quantization #2712

Merged
merged 22 commits into from
Oct 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
4b31bfa
Add OpenVINO support
helena-intel Jun 4, 2024
4c26bad
Fix OpenVINO test on Windows
helena-intel Jun 4, 2024
581ee19
Expand OpenVino support (remote models); add ONNX backend; tests
tomaarsen Sep 30, 2024
1a9b1c0
Move OV test to test_backends
tomaarsen Sep 30, 2024
f91747f
Merge branch 'master' into pr-2712
tomaarsen Sep 30, 2024
f03deff
Update push_to_hub test monkeypatching
tomaarsen Sep 30, 2024
1e9c3fa
Remove some dead code
tomaarsen Sep 30, 2024
e55dd4a
Skip multi-process tests for now
tomaarsen Sep 30, 2024
6b4b519
Move export_optimized_onnx_model to backend.py
tomaarsen Sep 30, 2024
f9d8f9d
Update __init__ to address the export_optimized_onnx_model move
tomaarsen Sep 30, 2024
bcd5dd7
Remove dot in commit message
tomaarsen Sep 30, 2024
02e8e27
Add PR description for export_optimized_onnx_model
tomaarsen Sep 30, 2024
3aa86e3
OpenVINO will override export=False; update tests
tomaarsen Sep 30, 2024
4cbb727
Add dynamic quantization exporting; docs; benchmarks, etc.
tomaarsen Oct 8, 2024
bc4caa6
Require 4.41.0 for eval_strategy, etc.
tomaarsen Oct 8, 2024
d213510
Restrict optimum-intel rather than optimum
tomaarsen Oct 8, 2024
25dd01d
Use subfolder rather than relying only on file_name
tomaarsen Oct 9, 2024
ea0ec5b
Add link to OVBaseModel.from_pretrained
tomaarsen Oct 9, 2024
23bbabd
Add tips pointing to the new efficiency docs
tomaarsen Oct 10, 2024
124ca2c
Another pointer to the new efficiency docs
tomaarsen Oct 10, 2024
0a037e9
Expand the benchmark details
tomaarsen Oct 10, 2024
57d3049
Update min. requirements to optimum 1.23.0 & optimum-intel 1.20.0
tomaarsen Oct 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install '.[dev,train]'
python -m pip install '.[train, onnx, openvino, dev]'

- name: Run unit tests
run: |
Expand Down
1 change: 1 addition & 0 deletions docs/_static/css/custom.css
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ dl.class > dt {
border-color: rgb(55 65 81);
background-color: #e3e3e3;
color: #404040; /* Override the colors imposed by <a href> */
max-width: 18rem;
}

.components > .box:nth-child(1) > .header {
Expand Down
2 changes: 2 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
"sphinx.ext.intersphinx",
"sphinx.ext.linkcode",
"sphinx_inline_tabs",
"sphinxcontrib.mermaid",
]

# Add any paths that contain templates here, relative to this directory.
Expand All @@ -68,6 +69,7 @@
"datasets": ("https://huggingface.co/docs/datasets/main/en/", None),
"transformers": ("https://huggingface.co/docs/transformers/main/en/", None),
"huggingface_hub": ("https://huggingface.co/docs/huggingface_hub/main/en/", None),
"optimum": ("https://huggingface.co/docs/optimum/main/en/", None),
"torch": ("https://pytorch.org/docs/stable/", None),
}

Expand Down
Binary file added docs/img/backends_benchmark_cpu.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/backends_benchmark_gpu.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
62 changes: 60 additions & 2 deletions docs/installation.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
# Installation

We recommend **Python 3.8+**, **[PyTorch 1.11.0+](https://pytorch.org/get-started/locally/)**, and **[transformers v4.34.0+](https://github.com/huggingface/transformers)**. There are three options to install Sentence Transformers:
We recommend **Python 3.8+**, **[PyTorch 1.11.0+](https://pytorch.org/get-started/locally/)**, and **[transformers v4.41.0+](https://github.com/huggingface/transformers)**. There are 5 extra options to install Sentence Transformers:
* **Default:** This allows for loading, saving, and inference (i.e., getting embeddings) of models.
* **Default and Training**: All of the above plus training.
* **ONNX:** This allows for loading, saving, inference, optimizing, and quantizing of models using the ONNX backend.
* **OpenVINO:** This allows for loading, saving, and inference of models using the OpenVINO backend.
* **Default and Training**: Like **Default**, plus training.
* **Development**: All of the above plus some dependencies for developing Sentence Transformers, see [Editable Install](#editable-install).

Note that you can mix and match the various extras, e.g. ``pip install -U "sentence-transformers[train, onnx-gpu]"``

## Install with pip

```eval_rst
Expand All @@ -15,6 +19,24 @@ We recommend **Python 3.8+**, **[PyTorch 1.11.0+](https://pytorch.org/get-starte

pip install -U sentence-transformers

.. tab:: ONNX

For GPU and CPU:
::

pip install -U "sentence-transformers[onnx-gpu]"

For CPU only:
::

pip install -U "sentence-transformers[onnx]"

.. tab:: OpenVINO

::

pip install -U "sentence-transformers[openvino]"

.. tab:: Default and Training

::
Expand Down Expand Up @@ -47,6 +69,24 @@ We recommend **Python 3.8+**, **[PyTorch 1.11.0+](https://pytorch.org/get-starte

conda install -c conda-forge sentence-transformers

.. tab:: ONNX

For GPU and CPU:
::

pip install -U "sentence-transformers[onnx-gpu]"

For CPU only:
::

pip install -U "sentence-transformers[onnx]"

.. tab:: OpenVINO

::

pip install -U "sentence-transformers[openvino]"

.. tab:: Default and Training

::
Expand Down Expand Up @@ -81,6 +121,24 @@ You can install ``sentence-transformers`` directly from source to take advantage

pip install git+https://github.com/UKPLab/sentence-transformers.git

.. tab:: ONNX

For GPU and CPU:
::

pip install -U "sentence-transformers[onnx-gpu] @ git+https://github.com/UKPLab/sentence-transformers.git"

For CPU only:
::

pip install -U "sentence-transformers[onnx] @ git+https://github.com/UKPLab/sentence-transformers.git"

.. tab:: OpenVINO

::

pip install -U "sentence-transformers[openvino] @ git+https://github.com/UKPLab/sentence-transformers.git"

.. tab:: Default and Training

::
Expand Down
6 changes: 6 additions & 0 deletions docs/package_reference/util.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,12 @@
:members: paraphrase_mining, semantic_search, community_detection, http_get, truncate_embeddings, normalize_embeddings, is_training_available, mine_hard_negatives
```

## Model Optimization
```eval_rst
.. automodule:: sentence_transformers.backend
:members: export_optimized_onnx_model, export_dynamic_quantized_onnx_model
```

## Similarity Metrics

```eval_rst
Expand Down
7 changes: 6 additions & 1 deletion docs/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Once you have `installed <installation.html>`_ Sentence Transformers, you can ea

- :meth:`SentenceTransformer.similarity_pairwise <sentence_transformers.SentenceTransformer.similarity_pairwise>`
- `SentenceTransformer > Usage <./sentence_transformer/usage/usage.html>`_
- `SentenceTransformer > Usage > Speeding up Inference <./sentence_transformer/usage/efficiency.html>`_
- `SentenceTransformer > Pretrained Models <./sentence_transformer/pretrained_models.html>`_
- `SentenceTransformer > Training Overview <./sentence_transformer/training_overview.html>`_
- `SentenceTransformer > Dataset Overview <./sentence_transformer/dataset_overview.html>`_
Expand Down Expand Up @@ -55,10 +56,14 @@ Once you have `installed <installation.html>`_ Sentence Transformers, you can ea
# [0.6660, 1.0000, 0.1411],
# [0.1046, 0.1411, 1.0000]])

With ``SentenceTransformer("all-MiniLM-L6-v2")`` we pick which `Sentence Transformer model <https://huggingface.co/models?library=sentence-transformers>`_ we load. In this example, we load `all-MiniLM-L6-v2 <https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2>`_, which is a MiniLM model finetuned on a large dataset of over 1 billion training pairs. Using `SentenceTransformer.similarity() <./package_reference/sentence_transformer/SentenceTransformer.html#sentence_transformers.SentenceTransformer.similarity>`_, we compute the similarity between all pairs of sentences. As expected, the similarity between the first two sentences (0.6660) is higher than the similarity between the first and the third sentence (0.1046) or the second and the third sentence (0.1411).
With ``SentenceTransformer("all-MiniLM-L6-v2")`` we pick which `Sentence Transformer model <https://huggingface.co/models?library=sentence-transformers>`_ we load. In this example, we load `all-MiniLM-L6-v2 <https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2>`_, which is a MiniLM model finetuned on a large dataset of over 1 billion training pairs. Using :meth:`SentenceTransformer.similarity() <sentence_transformers.SentenceTransformer.similarity>`, we compute the similarity between all pairs of sentences. As expected, the similarity between the first two sentences (0.6660) is higher than the similarity between the first and the third sentence (0.1046) or the second and the third sentence (0.1411).

Finetuning Sentence Transformer models is easy and requires only a few lines of code. For more information, see the `Training Overview <./sentence_transformer/training_overview.html>`_ section.

.. tip::

Read `Sentence Transformer > Usage > Speeding up Inference <sentence_transformer/usage/efficiency.html>`_ for tips on how to speed up inference of models by up to 2x-3x.

Cross Encoder
-------------

Expand Down
1 change: 1 addition & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,5 @@ sphinx_markdown_tables==0.0.17
recommonmark==0.7.1
sphinx-copybutton==0.5.2
sphinx_inline_tabs==2023.4.21
sphinxcontrib-mermaid==0.8.1
-e ..
4 changes: 4 additions & 0 deletions docs/sentence_transformer/pretrained_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,10 @@ similarities = model.similarity(embeddings, embeddings)

- **Model sizes**: it is recommended to filter away the large models that might not be feasible without excessive hardware.
- **Experimentation is key**: models that perform well on the leaderboard do not necessarily do well on your tasks, it is **crucial** to experiment with various promising models.

.. tip::

Read `Sentence Transformer > Usage > Speeding up Inference <./usage/efficiency.html>`_ for tips on how to speed up inference of models by up to 2x-3x.
```

## Original Models
Expand Down
Loading