UKPLab · tomaarsen · Oct 10, 2024 · Jun 4, 2024 · Jun 4, 2024 · Sep 30, 2024
diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
@@ -61,7 +61,7 @@ jobs:
       - name: Install dependencies
         run: |
           python -m pip install --upgrade pip
-          python -m pip install '.[dev,train]'
+          python -m pip install '.[train, onnx, openvino, dev]'
 
       - name: Run unit tests
         run: |

diff --git a/docs/_static/css/custom.css b/docs/_static/css/custom.css
@@ -41,6 +41,7 @@ dl.class > dt {
     border-color: rgb(55 65 81);
     background-color: #e3e3e3;
     color: #404040; /* Override the colors imposed by <a href> */
+    max-width: 18rem;
 }
 
 .components > .box:nth-child(1) > .header {

diff --git a/docs/conf.py b/docs/conf.py
@@ -43,6 +43,7 @@
     "sphinx.ext.intersphinx",
     "sphinx.ext.linkcode",
     "sphinx_inline_tabs",
+    "sphinxcontrib.mermaid",
 ]
 
 # Add any paths that contain templates here, relative to this directory.
@@ -68,6 +69,7 @@
     "datasets": ("https://huggingface.co/docs/datasets/main/en/", None),
     "transformers": ("https://huggingface.co/docs/transformers/main/en/", None),
     "huggingface_hub": ("https://huggingface.co/docs/huggingface_hub/main/en/", None),
+    "optimum": ("https://huggingface.co/docs/optimum/main/en/", None),
     "torch": ("https://pytorch.org/docs/stable/", None),
 }
 

diff --git a/docs/img/backends_benchmark_cpu.png b/docs/img/backends_benchmark_cpu.png
diff --git a/docs/img/backends_benchmark_gpu.png b/docs/img/backends_benchmark_gpu.png
diff --git a/docs/installation.md b/docs/installation.md
@@ -1,10 +1,14 @@
 # Installation
 
-We recommend **Python 3.8+**, **[PyTorch 1.11.0+](https://pytorch.org/get-started/locally/)**, and **[transformers v4.34.0+](https://github.com/huggingface/transformers)**. There are three options to install Sentence Transformers:
+We recommend **Python 3.8+**, **[PyTorch 1.11.0+](https://pytorch.org/get-started/locally/)**, and **[transformers v4.41.0+](https://github.com/huggingface/transformers)**. There are 5 extra options to install Sentence Transformers:
 * **Default:** This allows for loading, saving, and inference (i.e., getting embeddings) of models.
-* **Default and Training**: All of the above plus training.
+* **ONNX:** This allows for loading, saving, inference, optimizing, and quantizing of models using the ONNX backend.
+* **OpenVINO:** This allows for loading, saving, and inference of models using the OpenVINO backend.
+* **Default and Training**: Like **Default**, plus training.
 * **Development**: All of the above plus some dependencies for developing Sentence Transformers, see [Editable Install](#editable-install).
 
+Note that you can mix and match the various extras, e.g. ``pip install -U "sentence-transformers[train, onnx-gpu]"``
+
 ## Install with pip
 
 ```eval_rst
@@ -15,6 +19,24 @@ We recommend **Python 3.8+**, **[PyTorch 1.11.0+](https://pytorch.org/get-starte
 
         pip install -U sentence-transformers
 
+.. tab:: ONNX
+
+    For GPU and CPU:
+    ::
+
+        pip install -U "sentence-transformers[onnx-gpu]"
+
+    For CPU only:
+    ::
+
+        pip install -U "sentence-transformers[onnx]"
+
+.. tab:: OpenVINO
+
+    ::
+
+        pip install -U "sentence-transformers[openvino]"
+
 .. tab:: Default and Training
 
     ::
@@ -47,6 +69,24 @@ We recommend **Python 3.8+**, **[PyTorch 1.11.0+](https://pytorch.org/get-starte
 
         conda install -c conda-forge sentence-transformers
 
+.. tab:: ONNX
+
+    For GPU and CPU:
+    ::
+
+        pip install -U "sentence-transformers[onnx-gpu]"
+
+    For CPU only:
+    ::
+
+        pip install -U "sentence-transformers[onnx]"
+
+.. tab:: OpenVINO
+
+    ::
+
+        pip install -U "sentence-transformers[openvino]"
+
 .. tab:: Default and Training
 
     ::
@@ -81,6 +121,24 @@ You can install ``sentence-transformers`` directly from source to take advantage
 
         pip install git+https://github.com/UKPLab/sentence-transformers.git
 
+.. tab:: ONNX
+
+    For GPU and CPU:
+    ::
+
+        pip install -U "sentence-transformers[onnx-gpu] @ git+https://github.com/UKPLab/sentence-transformers.git"
+
+    For CPU only:
+    ::
+
+        pip install -U "sentence-transformers[onnx] @ git+https://github.com/UKPLab/sentence-transformers.git"
+
+.. tab:: OpenVINO
+
+    ::
+
+        pip install -U "sentence-transformers[openvino] @ git+https://github.com/UKPLab/sentence-transformers.git"
+
 .. tab:: Default and Training
 
     ::

diff --git a/docs/package_reference/util.md b/docs/package_reference/util.md
@@ -7,6 +7,12 @@
    :members: paraphrase_mining, semantic_search, community_detection, http_get, truncate_embeddings, normalize_embeddings, is_training_available, mine_hard_negatives
 ```
 
+## Model Optimization
+```eval_rst
+.. automodule:: sentence_transformers.backend
+   :members: export_optimized_onnx_model, export_dynamic_quantized_onnx_model
+```
+
 ## Similarity Metrics
 
 ```eval_rst

diff --git a/docs/quickstart.rst b/docs/quickstart.rst
@@ -23,6 +23,7 @@ Once you have `installed <installation.html>`_ Sentence Transformers, you can ea
 
    - :meth:`SentenceTransformer.similarity_pairwise <sentence_transformers.SentenceTransformer.similarity_pairwise>`
    - `SentenceTransformer > Usage <./sentence_transformer/usage/usage.html>`_
+   - `SentenceTransformer > Usage > Speeding up Inference <./sentence_transformer/usage/efficiency.html>`_
    - `SentenceTransformer > Pretrained Models <./sentence_transformer/pretrained_models.html>`_
    - `SentenceTransformer > Training Overview <./sentence_transformer/training_overview.html>`_
    - `SentenceTransformer > Dataset Overview <./sentence_transformer/dataset_overview.html>`_
@@ -55,10 +56,14 @@ Once you have `installed <installation.html>`_ Sentence Transformers, you can ea
    #         [0.6660, 1.0000, 0.1411],
    #         [0.1046, 0.1411, 1.0000]])
 
-With ``SentenceTransformer("all-MiniLM-L6-v2")`` we pick which `Sentence Transformer model <https://huggingface.co/models?library=sentence-transformers>`_ we load. In this example, we load `all-MiniLM-L6-v2 <https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2>`_, which is a MiniLM model finetuned on a large dataset of over 1 billion training pairs. Using `SentenceTransformer.similarity() <./package_reference/sentence_transformer/SentenceTransformer.html#sentence_transformers.SentenceTransformer.similarity>`_, we compute the similarity between all pairs of sentences. As expected, the similarity between the first two sentences (0.6660) is higher than the similarity between the first and the third sentence (0.1046) or the second and the third sentence (0.1411).
+With ``SentenceTransformer("all-MiniLM-L6-v2")`` we pick which `Sentence Transformer model <https://huggingface.co/models?library=sentence-transformers>`_ we load. In this example, we load `all-MiniLM-L6-v2 <https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2>`_, which is a MiniLM model finetuned on a large dataset of over 1 billion training pairs. Using :meth:`SentenceTransformer.similarity() <sentence_transformers.SentenceTransformer.similarity>`, we compute the similarity between all pairs of sentences. As expected, the similarity between the first two sentences (0.6660) is higher than the similarity between the first and the third sentence (0.1046) or the second and the third sentence (0.1411).
 
 Finetuning Sentence Transformer models is easy and requires only a few lines of code. For more information, see the `Training Overview <./sentence_transformer/training_overview.html>`_ section.
 
+.. tip::
+
+    Read `Sentence Transformer > Usage > Speeding up Inference <sentence_transformer/usage/efficiency.html>`_ for tips on how to speed up inference of models by up to 2x-3x.
+
 Cross Encoder
 -------------
 

diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -6,4 +6,5 @@ sphinx_markdown_tables==0.0.17
 recommonmark==0.7.1
 sphinx-copybutton==0.5.2
 sphinx_inline_tabs==2023.4.21
+sphinxcontrib-mermaid==0.8.1
 -e ..
diff --git a/docs/sentence_transformer/pretrained_models.md b/docs/sentence_transformer/pretrained_models.md
@@ -31,6 +31,10 @@ similarities = model.similarity(embeddings, embeddings)
 
     - **Model sizes**: it is recommended to filter away the large models that might not be feasible without excessive hardware.
     - **Experimentation is key**: models that perform well on the leaderboard do not necessarily do well on your tasks, it is **crucial** to experiment with various promising models.
+
+.. tip::
+
+    Read `Sentence Transformer > Usage > Speeding up Inference <./usage/efficiency.html>`_ for tips on how to speed up inference of models by up to 2x-3x.
 ```
 
 ## Original Models