Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for memory leak in pbc_shortest_vectors #2983

Merged
merged 3 commits into from
May 10, 2023

Conversation

stichri
Copy link
Contributor

@stichri stichri commented May 9, 2023

Summary

The pbc_shortest_vectors function leaks all memory allocated to store the fractional coordinates of surrounding periodic sites. As a result, memory consumption keeps creeping up when (heavily) calling the function, see this memory profile of a script of mine which made me aware of the problem. The suggested modifications completely eliminate the issue, curtailing the memory consumption of the same script.

With the suggested minimal modifications, additional memory is only allocated (and freed accordingly) when needed - that is, when periodic boundary conditions are applied selectively.

Before

v2023 5 8

After

fixed

@shyuep
Copy link
Member

shyuep commented May 9, 2023

Great. Thanks! I have a small request. Can you provide a simple benchmark of the difference in speed (if any) between old and new version? This is a rather core part of the code. I would want to make sure that the efficiency is not compromised.

@janosh janosh added awaiting user Needs more information from OP. fix Bug fix PRs memory Excessive memory use and leaks labels May 10, 2023
@@ -178,6 +179,8 @@ def pbc_shortest_vectors(lattice, fcoords1, fcoords2, mask=None, return_d2=False
for l in range(3):
vs[i, j, l] = 1e20

if not (n_pbc == 3):
free(&frac_im[0,0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes don't look like they should affect performance. Not sure how fast the free operation is but considering we're currently leaking memory, some slowdown is definitely preferable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor pt - find_pbc_vector is very very performance critical (this is why it is in cython). I am willing to lose < 10% performance for mem, but if it is a 10x diff (this is not difficult to imagine), we need to consider the tradeoff. Basically all operations involving bond distances will slow down. Since the performance loss seems minimal, this is ok.

@janosh
Copy link
Member

janosh commented May 10, 2023

I've never written a test for cython code. @stichri Is there a way we can ensure no regressions on mem leaks?

@stichri
Copy link
Contributor Author

stichri commented May 10, 2023

As request by @shyuep, I wrote this very simple benchmark.
The idea is to assess the overall runtime impact, which I think will play out along two parameters: The relative amount of work done per call - either lots or little - and the applied periodic boundary conditions (PBC) - either in all directions or just in one direction. Generally, I'd expect the highest impact from avoided calls of malloc when doing little work per call with PBC in all directions and from additional calls of free when doing little work per call with PBC in just one direction.
Running the benchmark on my mobile AMD Ryzen 7 using pymatgen v2023.5.8 in python 3.11 on Fedora38 in WSL2 running on Windows 11 results in:

PBC in ALL directions:
fastest out of 10 runs with 100000 calls each, LITTLE WORK per call: 1.76117e+00s
fastest out of 10 runs with 100 calls each, LOTS OF WORK per call: 3.22905e+00s
PBC in ONE directions:
fastest out of 10 runs with 100000 calls each, LITTLE WORK per call: 7.77626e-01s
fastest out of 10 runs with 100 calls each, LOTS OF WORK per call: 6.76336e-01s

The same benchmark on the same machine with the modifications applied results in:

PBC in ALL directions:
fastest out of 10 runs with 100000 calls each, LITTLE WORK per call: 1.68359e+00s
fastest out of 10 runs with 100 calls each, LOTS OF WORK per call: 3.28771e+00s
PBC in ONE directions:
fastest out of 10 runs with 100000 calls each, LITTLE WORK per call: 7.95001e-01s
fastest out of 10 runs with 100 calls each, LOTS OF WORK per call: 7.19025e-01s

Judging from this very rough benchmarking, I would assess the impact on runtimes to be typically less than 10%.
Moreover, should that turn out to be an issue, I think this additional memory managment could be eliminated altogether when considering additional modifications; The PBC application is ultimately controlled by a bool triplet in the passed lattice object, thus there should be just 8 different cases to which the images_view variable would be needed to be assigned to, just as it currently is for the special case of full PBCs. All the additional 7 cases could thus be hardcoded into constant module-level objects as well.

Regarding additional tests for memory leaks within cython code as @janosh mentioned, unfortunately I wouldn't be aware of a particularly convenient way of doing it 'from the python side of things'.
That is, if I were to systematically check for these kind of errors, I would probably use (dynamical memory analysis tools like) valgrind. After all, the CPython implementation is conveniently shipped with a suppression file for false positives when running python. I guess it's always possible to script some tests that automatically do just that at the cost of pulling in potentially new dependencies like valgrind.

@shyuep shyuep merged commit f77769b into materialsproject:master May 10, 2023
@janosh
Copy link
Member

janosh commented May 10, 2023

Thanks for the benchmark!

@shyuep
Copy link
Member

shyuep commented May 10, 2023

Thanks. Much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting user Needs more information from OP. fix Bug fix PRs memory Excessive memory use and leaks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants