[`feat` + `fix`] Add `normalize_embeddings` support to multi-process encoding; fix multi-process encoding on CUDA devices #2377

tomaarsen · 2023-12-13T14:21:44Z

Supersedes #2252

Hello!

Pull Request overview

Add normalize_embeddings support to multi-process encoding
Fix multi-process encoding on CUDA-enabled devices
- In particular, it caused all model weights to become 0.0
Add tests

Details

The normalize_embeddings functionality is fairly straightforward - it speaks for itself.

The encoding fix not quite so much. When the model is loaded on CUDA and start_multi_process_pool is called, then the model weights become 0.0 as soon as one of the processes start. There's a tad more info here. This is counteracted when the model is on CPU originally, so I move the model to cpu just prior to starting the encoding. I also use share_memory as it's recommended.

Thanks @TeisNP for starting this work!

Tom Aarsen

TeisNP and others added 7 commits July 11, 2023 11:53

added normalize option to multiprocess encode

e1a18e7

Merge branch 'master' into pr-2252

5c73ab9

Rename some variables

f7b37f4

Add missing docstring

8186bc4

Update logger text

4561191

Moving to CPU is required, otherwise all weights become 0.0

05d96a5

Update test_encode_multi_process tests

bdf06df

tomaarsen merged commit 8af4744 into UKPLab:master Dec 13, 2023
8 checks passed

tomaarsen deleted the feat/multi-process-encode-normalize branch December 13, 2023 15:31

tomaarsen mentioned this pull request Dec 13, 2023

added normalize option to multiprocess encode #2252

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`feat` + `fix`] Add `normalize_embeddings` support to multi-process encoding; fix multi-process encoding on CUDA devices #2377

[`feat` + `fix`] Add `normalize_embeddings` support to multi-process encoding; fix multi-process encoding on CUDA devices #2377

tomaarsen commented Dec 13, 2023 •

edited

Loading

[feat + fix] Add normalize_embeddings support to multi-process encoding; fix multi-process encoding on CUDA devices #2377

[feat + fix] Add normalize_embeddings support to multi-process encoding; fix multi-process encoding on CUDA devices #2377

Conversation

tomaarsen commented Dec 13, 2023 • edited Loading

Pull Request overview

Details

[`feat` + `fix`] Add `normalize_embeddings` support to multi-process encoding; fix multi-process encoding on CUDA devices #2377

[`feat` + `fix`] Add `normalize_embeddings` support to multi-process encoding; fix multi-process encoding on CUDA devices #2377

tomaarsen commented Dec 13, 2023 •

edited

Loading