fix backward pass for cached losses #3114

Marcel256 · 2024-12-03T15:00:28Z

Here is a draft for a fix of the backward pass in the cached losses, while still maintaining the compatibility with the matryoshka loss. The problem is a bit more difficult than I originally thought because we need to detach the tensor before doing the minibatch loss computation.

If anyone has suggestions for improvements, let me know 😀

tomaarsen · 2024-12-04T12:28:49Z

sentence_transformers/losses/CachedGISTEmbedLoss.py

@@ -230,14 +230,15 @@ def embed_minibatch_iter(
    def calculate_loss_and_cache_gradients(self, reps: list[list[Tensor]], reps_guided: list[list[Tensor]]) -> Tensor:
        """Generalized function to calculate the cross-entropy loss and cache the gradients wrt. the embeddings."""
        loss = self.calculate_loss(reps, reps_guided)


Suggested change

loss = self.calculate_loss(reps, reps_guided)

loss = self.calculate_loss(reps, reps_guided, with_backward=True)

I believe this is missing for CGIST.

tomaarsen · 2024-12-04T12:39:56Z

My initial tests with CMNRL and CMNRL + Matryoshka are promising, and I can indeed also reproduce the high memory usage on master.

Matryoshka on top of CMNRL only seems to add about 20%-25% training time, which seems fine.

CMNRL does seem a decent bit slower than pure MNRL (more than the 10-20% that I thought it was), but based on your changes in this PR and the previous, that shouldn't be related to you at all.

Tom Aarsen

tomaarsen · 2024-12-12T11:00:06Z

Nice! I assume that 0292b9b means that we perform evaluations more efficiently?

I think this is ready to go.

Marcel256 · 2024-12-12T13:18:08Z

Yes, with 0292b9b the loss can now also be called with torch.no_grad().
I trained several models with this fix and the results are looking good. No more memory issues 😄

tomaarsen · 2024-12-12T18:49:42Z

Thanks a bunch for having another look at this @Marcel256 🤗

Tom Aarsen

kuanhsieh · 2025-01-13T12:39:29Z

Thanks a lot @tomaarsen and @Marcel256 for this fix, this would be extremely helpful for my work on fine-tuning embedding models using the CachedMultipleNegativesRankingLoss together with the MatryoshkaLoss.

Is there an expected timeline for when there will be a new release of sentence-transformers which will contain this fix? Thanks a lot!

tomaarsen · 2025-01-13T15:21:03Z

Hello @kuanhsieh!

Normally my releases are just whenever there's been a sufficiently big change that warrants an update, which has historically been about once a month. However, due to the holidays/new years, and also the ModernBERT release, things have been a bit slower.

I can however try to prioritize a smaller release in the near future. Perhaps some time next week? Until then, feel free to train with the bleeding edge GitHub version:

pip install -U git+https://github.com/UKPLab/sentence-transformers.git

The trained model should work on much older versions too, even <v3, I reckon.

Tom Aarsen

kuanhsieh · 2025-01-13T15:35:47Z

Thank you very much for update and for all the work on ModernBERT! I will use the Github version in the meantime.

tomaarsen · 2025-01-23T15:43:25Z

Hello @kuanhsieh, I've made the v3.4.0 release: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.4.0

Tom Aarsen

fix backward pass for cached losses

1bb592a

Marcel256 mentioned this pull request Dec 4, 2024

Inefficient loss calculation in cached losses #3107

Closed

tomaarsen reviewed Dec 4, 2024

View reviewed changes

Marcel Brunnbauer added 2 commits December 4, 2024 15:21

fix missing backward flag

c2d397f

don't perform backward pass in evaluation mode

0292b9b

Marcel256 marked this pull request as ready for review December 10, 2024 10:58

tomaarsen merged commit cfb883c into UKPLab:master Dec 12, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix backward pass for cached losses #3114

fix backward pass for cached losses #3114

Marcel256 commented Dec 3, 2024

tomaarsen Dec 4, 2024

tomaarsen commented Dec 4, 2024 •

edited

Loading

tomaarsen commented Dec 12, 2024 •

edited

Loading

Marcel256 commented Dec 12, 2024

tomaarsen commented Dec 12, 2024

kuanhsieh commented Jan 13, 2025

tomaarsen commented Jan 13, 2025 •

edited

Loading

kuanhsieh commented Jan 13, 2025

tomaarsen commented Jan 23, 2025

	loss = self.calculate_loss(reps, reps_guided)
	loss = self.calculate_loss(reps, reps_guided, with_backward=True)

fix backward pass for cached losses #3114

fix backward pass for cached losses #3114

Conversation

Marcel256 commented Dec 3, 2024

tomaarsen Dec 4, 2024

Choose a reason for hiding this comment

tomaarsen commented Dec 4, 2024 • edited Loading

tomaarsen commented Dec 12, 2024 • edited Loading

Marcel256 commented Dec 12, 2024

tomaarsen commented Dec 12, 2024

kuanhsieh commented Jan 13, 2025

tomaarsen commented Jan 13, 2025 • edited Loading

kuanhsieh commented Jan 13, 2025

tomaarsen commented Jan 23, 2025

tomaarsen commented Dec 4, 2024 •

edited

Loading

tomaarsen commented Dec 12, 2024 •

edited

Loading

tomaarsen commented Jan 13, 2025 •

edited

Loading