v2.3.0 - Bug fixes, improved model loading & Cached MNRL
This release focuses on various bug fixes & improvements to keep up with adjacent works like transformers
and huggingface_hub
. These are the key changes in the release:
Pushing models to the Hugging Face Hub (#2376)
Prior to Sentence Transformers v2.3.0, saving models to the Hugging Face Hub may have resulted in various errors depending on the versions of the dependencies. Sentence Transformers v2.3.0 introduces a refactor to save_to_hub
to resolve these issues.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
...
model.save_to_hub("tomaarsen/all-MiniLM-L6-v2-quora")
pytorch_model.bin: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 90.9M/90.9M [00:06<00:00, 13.7MB/s]
Upload 1 LFS files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:07<00:00, 7.11s/it]
Model Loading
Efficient model loading (#2345)
Recently, transformers
has shifted towards using safetensors
files as their primary model file formats. Additionally, various other file formats are commonly used, such as PyTorch (pytorch_model.bin
), Rust (rust_model.ot
), Tensorflow (tf_model.h5
) and ONNX (model.onnx
).
Prior to Sentence Transformers v2.3.0, almost all files of a repository would be downloaded, even if theye are not strictly required. Since v2.3.0, only the strictly required files will be downloaded. For example, when loading sentence-transformers/all-MiniLM-L6-v2 which has its model weights in three formats (pytorch_model.bin
, rust_model.ot
, tf_model.h5
), only pytorch_model.bin
will be downloaded. Additionally, when downloading intfloat/multilingual-e5-small
with two formats (model.safetensors
, pytorch_model.bin
), only model.safetensors
will be downloaded.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
Downloading modules.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 349/349 [00:00<?, ?B/s]
Downloading (…)ce_transformers.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 116/116 [00:00<?, ?B/s]
Downloading README.md: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 10.6k/10.6k [00:00<?, ?B/s]
Downloading (…)nce_bert_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 53.0/53.0 [00:00<?, ?B/s]
Downloading config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 612/612 [00:00<?, ?B/s]
Downloading pytorch_model.bin: 100%|█████████████████████████████████████████████████████████████████████████████████████| 90.9M/90.9M [00:06<00:00, 15.0MB/s]
Downloading tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 350/350 [00:00<?, ?B/s]
Downloading vocab.txt: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 1.37MB/s]
Downloading tokenizer.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 466k/466k [00:00<00:00, 4.61MB/s]
Downloading (…)cial_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 112/112 [00:00<?, ?B/s]
Downloading 1_Pooling/config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 190/190 [00:00<?, ?B/s]
Note
This release updates the default cache location from ~/.cache/torch/sentence_transformers
to the default cache location of transformers
, i.e. ~/.cache/huggingface
. You can still specify custom cache locations via the SENTENCE_TRANSFORMERS_HOME
environment variable or the cache_folder
argument.
Additionally, by supporting newer versions of various dependencies (e.g. huggingface_hub
), the cache format changed. A consequence is that the old cached models cannot be used in v2.3.0 onwards, and those models need to be redownloaded. Once redownloaded, an airgapped machine can load the model like normal despite having no internet access.
Loading custom models (#2398)
This release brings models with custom code to Sentence Transformers through trust_remote_code
, such as jinaai/jina-embeddings-v2-base-en
.
from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim
model = SentenceTransformer("jinaai/jina-embeddings-v2-base-en", trust_remote_code=True)
embeddings = model.encode(['How is the weather today?', 'What is the current weather like today?'])
print(cos_sim(embeddings[0], embeddings[1]))
# => tensor([[0.9341]])
Loading specific revisions (#2419)
If an embedding model is ever updated, it would invalidate all of the embeddings that you have created with the prior version of that model. We promise to never update the weights of any sentence-transformers/...
model, but we cannot offer this guarantee for models by the community.
That is why this version introduces a revision
keyword, allowing you to specify exactly which revision or branch you'd like to load:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("BAAI/bge-small-en-v1.5", revision="982532469af0dff5df8e70b38075b0940e863662")
# or a branch:
model = SentenceTransformer("BAAI/bge-small-en-v1.5", revision="main")
Soft deprecation of use_auth_token
, use token
instead (#2376)
Following updates from transformers
& huggingface_hub
, Sentence Transformers now recommends that you use the token
argument to provide your Hugging Face authentication token to download private models.
from sentence_transformers import SentenceTransformer
# new:
model = SentenceTransformer("tomaarsen/all-mpnet-base-v2", token="hf_...")
# old, still works, but throws a warning to upgrade to "token"
model = SentenceTransformer("tomaarsen/all-mpnet-base-v2", use_auth_token="hf_...")
Note
The recommended way to include your Hugging Face authentication token is to run huggingface-cli login
& paste your User Access Token from your Hugging Face Settings. See these docs for more information. Then, you don't have to include the token
argument at all; it'll be automatically read from your filesystem.
Device patch (#2351)
Prior to this release, SentenceTransformers.device
would not always correspond to the device on which embeddings were computed, or on which a model gets trained. This release brings a few fixes:
SentenceTransformers.device
now always corresponds to the device that the model is on, and on which it will do its computations.- Models are now immediately moved to their specified device, rather than lazily whenever the model is being used.
SentenceTransformers.to(...)
,SentenceTransformers.cpu()
,SentenceTransformers.cuda()
, etc. will now work as expected, rather than being ignored.
Cached Multiple Negatives Ranking Loss (CMNRL) (#1759)
MultipleNegativesRankingLoss (MNRL) is a powerful loss function that is commonly applied to train embedding models. It uses in-batch negative sampling to produce a large number of negative pairs, allowing the model to receive a training signal to push the embeddings of this pair apart. It is commonly shown that a larger batch size results in better performing models (Qu et al., 2021, Li et al., 2023), but a larger batch size requires more VRAM in practice.
To counteract that, @kwang2049 has implemented a slightly modified GradCache technique that is able to separate the batch computation into mini-batches without any reduction in training quality. This allows the common practitioner to train with competitive batch sizes, e.g. 65536!
The downside is that training with Cached MNRL (CMNRL) is roughly 2 to 2.4 times slower than using normal MNRL.
CachedMultipleNegativesRankingLoss
is a drop-in replacement for MultipleNegativesRankingLoss
, but with a new mini_batch_size
argument. I recommend trying out CMNRL with a large batch size and a fairly small mini_batch_size
- the larger mini batch size that will fit into memory.
from sentence_transformers import SentenceTransformer, losses, InputExample
from torch.utils.data import DataLoader
model = SentenceTransformer("distilbert-base-uncased")
train_examples = [
InputExample(texts=['Anchor 1', 'Positive 1']),
InputExample(texts=['Anchor 2', 'Positive 2']),
]
train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=1024) # Here we can try much larger batch sizes!
train_loss = losses.CachedMultipleNegativesRankingLoss(model=model, mini_batch_size = 32)
model.fit([(train_dataloader, train_loss)], ...)
Community Detection (#1879, #2277, #2381)
This release updates the community_detection
function in various ways. Notably:
- It should no longer run forever when there is only one community (d8982c9).
- A new
show_progress_bar
option has been added (#1879) - The first item in each community is now the cluster centroid, and all subsequent items are sorted by similarity to the centroid (#2277)
- Heavily improve processing speed on GPUs (#2381)
In the below graph, master
refers to Sentence Transformers v2.2.2 and refactor
refers to v2.3.0. On GPU, the computation time was heavily reduced.
Updated Dependencies (#2376, #2432)
Sentence Transformers has deprecated Python 3.7 following its end of security support. Additionally, various dependencies have been updated to prevent functionality from breaking. In particular:
torch >= 1.11.0
transformers>= 4.32.0
huggingface_hub>=0.15.1
Lastly, torchvision
has been removed as a dependency.
Additional Highlights
See the following for a list of release highlights:
- Add weighted mean & last token pooling for SGPT support by @Muennighoff (#1613)
- Prevent
community_detection
from running forever by @nreimers (d8982c9) - Allow loading private transformers models by @su-park (#1682)
- Add support for multilingual T5 encoders (db34d38)
- Reduce RAM usage in InformationRetrievalEvaluator and util.semantic_search by @kwang2049 (#1715)
- Automatically place models on MPS if available by @nikitajz (#2342)
- Add a progress bar for community detection by @Marlon154 (#1879)
- Simplify tests, add CI, patch paraphrase_mining_embeddings by @tomaarsen (#2350)
- Remove unused torchvision dependency by @dvruette (#1881)
- Introduce Pillow as a dependency by @tomaarsen (#2374)
- Remove Python 3.7 support by @tomaarsen (#2375)
- Refactor model loading, no more unnecessary file downloads by @tomaarsen (#2345)
- Prevent
to
from getting ignored, replace._target_device
with.device
by @tomaarsen (#2351) - Add
normalize_embeddings
support to multi-process encoding by @tomaarsen (#2377) - Fix multi-process encoding on CUDA devices by @tomaarsen (#2377)
- Simplify & fix
save_to_hub
, remotegit
dependency, addtoken
argument by @tomaarsen (#2376) - Update dependencies:
transformers>=4.32.0
andhuggingface_hub>=0.15.1
by @tomaarsen (#2376) - Simplify the smart_batching_collate function by @vsuarezpaniagua (#1852)
- Fix indexing of lasttoken pooling for longest sequence by @ssharpe42 (#2111)
- Set the Linear device equal to the main model device in SoftmaxLoss by @tomaarsen (#2378)
- Ensure the first item in each community is the cluster centroid in
community_detection
by @dyaaalbakour (#2277) - Improve efficiency of community detection on GPU by @tomaarsen (#2381)
- Use the library_name metadata in the model card by @tomaarsen (#2386)
- Fix error when encoding empty list with convert_to_tensor=True by @oToToT (#1775)
- Add return type hints to util methods by @zachschillaci27 (#1754)
- Also accept word2vec format in WordEmbeddings by @mokha (#1875)
- Fix LSTM layer on newer torch versions by @lambdaofgod (#1420)
- Pass
token
andtrust_remote_code
totokenizer_args
too by @tomaarsen (#2411) - If
cache_folder
norSENTENCE_TRANSFORMERS_HOME
are set, use HF default cache by @tomaarsen (#2412) - replace unittest with pytest by @bwanglzu (#2407)
- Add GradCache + MNRL: Go beyond GPU-memory limit for MNRL by @kwang2049 (#1759)
- Add revision to load a specific model version by @tomaarsen (#2419)
- Add
@k
at the end of csv file name for RerankingEvaluator by @milistu (#2427) - bump the minimum supported torch version to 1.11 by @statelesshz (#2432)