Reduce memory footprint #16

hogru · 2024-03-27T15:39:57Z

Context

I used the new version (1.2) of fcd to calculate the FCD of > 1M molecules.

Issue

In this scenario, my corresponding python script crashed due to a large memory footprint (> 50 GB) on both macOS and Linux. This might be a local issue and depend on the specific pytorch version, among other things.

Suggested resolution

I amended the code in two ways:

(1) Used a different context manager for inference; this does not solve the issue, but was done in addition to ...
(2) Casting the inference result to numpy float32 reduced the memory footprint. I am not sure why this works since the data type without the corresponding line is already float32 and should be without any additional data, such as gradients.

I calculated the FCD for smaller molecule sets of size 100,000 to check whether the FCD value remains the same, which it did in my experiments.

In summary, I consider this to be a minor change which helps to alleviate memory problems, at least in certain configurations.

1. Changes to `get_one_hot` Problems are given in: - #14 - #17 - #13 I discarded the changes in the PRs and and added more comprehensive handling of the input data in the `SmilesDataset` class and the `get_one_hot` function. 2. Imaginary components Frechet distance calculation fails to work for some cases because of badly conditioned matrices, as described here #15. Could not reproduce the error locally, but could do so on colab. Fixed it in `calculate_frechet_distance` by checking if the first `covmean` computation is real add a small value to the diagonal. This made it work for me and I got the same result as the original implementation run locally. 3. Added some more tests and changed to pytest 4. As described in #16 I changed the data type of the activations to float32 in the `get_predictions` function, which saves memory for larger datasets.

1. Changes to `get_one_hot` Problems are given in: - #14 - #17 - #13 I discarded the changes in the PRs and and added more comprehensive handling of the input data in the `SmilesDataset` class and the `get_one_hot` function. 2. Imaginary components Frechet distance calculation fails to work for some cases because of badly conditioned matrices, as described here #15. Could not reproduce the error locally, but could do so on colab. Fixed it in `calculate_frechet_distance` by checking if the first `covmean` computation is real add a small value to the diagonal. This made it work for me and I got the same result as the original implementation run locally. 3. Added some more tests and changed to pytest 4. As described in #16 I changed the data type of the activations to float32 in the `get_predictions` function, which saves memory for larger datasets. 5. Change to pyproject.toml

renzph · 2024-04-01T16:02:03Z

Hey Stephan. Thank you so much for your input.

I changed this in the new version at

FCD/fcd/fcd.py

Line 79 in f806d58

    
           model(batch.transpose(1, 2).float().to(device)).to("cpu").detach().numpy().astype(np.float32)

hogru added 2 commits February 25, 2024 21:04

Fix memory leak

305b57c

Amend source comment

b25641e

renzph force-pushed the master branch 2 times, most recently from 53a08c2 to f806d58 Compare April 1, 2024 15:55

renzph closed this Apr 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory footprint #16

Reduce memory footprint #16

hogru commented Mar 27, 2024

renzph commented Apr 1, 2024

Reduce memory footprint #16

Reduce memory footprint #16

Conversation

hogru commented Mar 27, 2024

Context

Issue

Suggested resolution

renzph commented Apr 1, 2024