Optimize cython find_points_in _spheres #3015

lbluque · 2023-05-25T19:15:25Z

Summary

Optimized cython code in find_points_in_spheres, getting ~5x faster runtime.

For example here are the runtimes for searching for all neighbors within a 5A cutoff in a 32 atom FCC (on my laptop with Intel i7 (2.8 GHz) CPU):

version 2023.5.10: 955 µs ± 35.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
this PR: 115 µs ± 2.69 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Checklist

Work-in-progress pull requests are encouraged, but please put [WIP] in the pull request title.

Before a pull request can be merged, the following items must be checked:

Doc strings have been added in the Google docstring format. Run pydocstyle on your code.
Type annotations are highly encouraged. Run mypy path/to/file.py to type check your code.
Tests have been added for any new functionality or bug fixes.
All linting and tests pass.

Our CI will run all the above checks but it might be more efficient if you already fix most errors before submitting the PR. We highly recommended installing pre-commit hooks. Simply Run

pip install -U pre-commit
pre-commit install

in the repo's root directory. Once pre-commit has installed git hooks, our linters will run before every commit and abort if issues pop up.

…pymatgen/optimization/linear_assignment.c'" also remove continue-on-error: true from release step in test.yml

Until we can be sure non-orthogonal lattices will work

…ct#2772) * added dos_fingerprint and similarity index methods * Added test cases and reformatted and cleaned code * added binwidth ,renamed states to densities in fp obj,updated tests * added source link * changed get_dos_fp_similarity and fp_to_dict methods to static * Delete Test.py, unnecessary file * simplified dict updating, added missing type annotations * NamedTuple return type fixed * small clean up * document get_dos_fp() and get_dos_fp_similarity() raise conditions in doc str * add types for fp1,fp2 and update doc str * update exception tests Co-authored-by: anaik <[email protected]> Co-authored-by: Janosh Riebesell <[email protected]>

* Added a property assignment structure for testing. Tests not yet written. * Modified test file. Removed NLCC_POTENTIALS.yaml, which has no use at the moment. * Commits to this branch will be updates to defects module. Added WIP updates to the corrections module, which should include some simpler corrections like point charge. Added a WIP update to the thermodynamics module, which is going to include support for predomenance diagrams in the near future * Defect module WIP * Try else got tripped unecessarily. Fixed with conditional. Also removed a print statement. * Updated import statement. * Updated import statement. * Updated import statement. * Fixed charge transfer, which was not working. * Slightly better version of last commit * Plotting updates. * A very subtle error... Everything in the sets/inputs preserves the ordering of the sites, except when you use the get_unique_site_indices function to get aliases such as {'Mg_1': [...], 'Mg_2': [...], 'O_1': [...]}. Then, the sites are given by a alias: index dictionary that doesn't preserve order, and so sites can get jumbled when this happens. It doesn't always happen, so it was hard to identify. Now, when aliasing is used for the Coord section, the keywords are created in order of their indices. To further protect against these sorts of issues for unforseen problems, I'm changing "self.keywords" to be stored as an OrderedDict once they are provided to the Section. Since Ordered Dicts inherit from Dicts, this shouldn't change anything. Lastly, I have applied black formatting to everything to keep consistent. * mypy on github (but not on my machine, oddly) didn't like the use of aliases.get because SupportsLessThan is not part of it, even though it worked in practice. While this new method is less elegant, it still works well enough. * Reorder import. Couple of formatting corrections from pylint. * Update plotter. * Created a polaron class as a child of Substitution. * Trying out a "RelaxationTransformation" to help stitch together the transformation history of a parent and child task. * Tweak as_dict() summary. * The GhostVacancy is a special vacancy object for CP2K calculations where the pseudopotential of the atom is turned off, but the basis functions are left behind. * added dtype=object to np.array * Update import statement * Assembled basis sets, pseudopotential, and DFT+U settings into a single settings file, which should be cleaner than using separate ones. * Updated in accordance with new settings.yaml file. * Update in accordance with new settings.yaml file. * Update to one of the hubbard values. * Fixed the getting of hubbard values. * Updated predominance diagrams * A temporary measure on my dev branch for avoid Potcar hashing with CP2K calculations * These files are no longer necessary, as settings.yaml will hold all this info. * Update attribute. * Update attribute * Update the plot with "combine_charges" to make cleaner figures * Keywords must have strings for names. Numbers will cause problems. * Update to the mo_eigenvalues parsing for non-OT compatability. Parsing a raw text file is incredibly sloppy. * Update to the file reading. Subsections can have "subsection params", some of which are used to alias the subsection, but some of which are variables like "on" or "off". They are problematic from the point of view of the Section objects. This will fix the issue of cp2k restart files automatically assigning some of these superfluous params, but is not a totally robust solution. * Change to default diagonalization params. * Update dict band gap. * Updating mo_eigenvalues parsing, which is getting troublesome mixing OT and diagonalization FWs now. * Modifications to the DOS parser so that its more integrated with the Dos module. Also made the DOS parser the default method for extracting gaps, vbm, and cbm, which works for either OT or diagonalization. * Update for aux_basis set. "match" keyword will now use primary basis, which can be used for defect sites. Starting to get lots of spaghetti code here. Also fixed indentation error. * Added aux_basis argument * Don't assume initial structure was parsed. * Need hubbard > 0 * Update to diagonalization settings. * Section Lists * Reject if sizes don't match * Testing new plot. * Possible hash for defects. * Nevermind. * Remove warning. * Update plus_u and xc_functional * fix * Update for functional processing. Do not like this as it relies on having the input file present, but the text in the output file is too variable to parse reliably. * Still need to figure out how to get mos automatically. Upped it for now. Elec temp of 500 maybe have been a little too high. * Change how output parser gets the functional. * Use the getattr operation to be case insensitive. * Update section descriptions. * So bader can use reference charges with cubes. Will submit patch and merge. * This function should not default to cm^-3 units. Most/all of pymatgen does things per formula unit / composition object. So that's how this should be as well. The exponential is for an individual atom, and the multiplicity attribute will convert it to a per f.u. basis. * Dos objects * Use structure matcher instead of exact equality to deal with empty site properties. * fixing cube * Fix for getting items. Now there is a seperate get function for keyword and section, so internal methods can be more robust. * Update to docstring and import. * Adsorbent class. * Concentration should be in formula units by default. * (1) prettier plots (2) get_doping() added since n/p are reused. * Docstring. * Add rotations * forgot to make msonable * Kpoints keyword Scheme keyword was missing. Lead to errors. * Updates for functionals Some updates for functionals to make LibXC slightly easier. * Parse initial structure For some reason, one example did not show the default blank line. Not sure if it was the new version of cp2k or something else, but make the blank line a conditional now. * �Parse initial structure Last commit did not cover edge case. This should work. * Updates. Fix readline for both cases of coord table. Modify sets for hybrids. New utility function for truncation radius. * optFIT basis Opt fit basis sets should be preferred for C and B. These are not part of standard dist. and should be added to settings as well. * Convergence criteria Update the force convergence criteria to roughly match the MPRelaxSet * Add option for strict updates. Update now supports optional "strict" keyword to avoid inserting new keywords and sections. * Update hybrid Slight tweaks to logic for hybrid sets. * Update test Nonsense structure cannot use dummy element. Sets check validity of element inputs now. * Update functionals Hybrid was not being registered for PBE0 since only one functional is used. * Add cutoff radius Include cutoff radius even if not used in order to suppress warnings. * Update core defects Adsorbate is that species that adsorbs onto a surface, adsorbent is surface doing the adsorbing. Also the default defect concentration is back to cm-3. This is not ultimately desirable, but its causing too many problems. * Updates * Cutoff radius Do not override. Just provide a warning. Also make the long range correction default to provide better energies when cutoff is very small. * Fix cutoff * Changes and Fixes Newline in inputs.py Adjust stride in sets.py Read energies in outputs.py * Update args * Fix * stress tensor * Allow positive * Updates * New Thermo * Remove temp Temporary things were saved during a laptop crash. Now removed. * SectionList Modify the file reader with SectionList * Double loop * Drop undeveloped corrections and work on 2d correction * Get section explicit * Dont check if not running freysoldt * Rename Predominance diagram may confuse people. Will use Brouwer diagram instead. DFT plus U is very finicky in cp2k due to basis sets. It only really works with LOWDIN scheme, which is only available in newer versions of the code, and even then it is prone to instabilities. For these reasons, plus u should not be the default. * Formatting updates * Deprecation upcoming * Formatting * Formatting * Formatting * Defect Complex and Linting * Typing * Emergency push Laptop malfunctioning need to sync * Update to correction * Bandstructures Input objects for band structure and ability to parse the band structure text files generated by cp2k. * Revisions and Bandstructure support Some formatting has been updated for new linting. Bandstructure support has been added to inputs and output parsers. Some updates to DOS parsers as well. * Kpoints * Format * Revert to master * Last minute fix. Needs to access num_kpts * Needs to access num_kpts * Old test and type check * Some more lint * Unused import * Lint * override_default_params must not be none * None->{} * Add self * Loops * remove sigma * A few changes to tests and test files * fix assert * AssertEqual * Kpoints * Dict already preserves ordering since py3.7 * use .items() * EPS_PGF_ORB * Gamma kpoint * Basis and Potential * None * Dict * startswith * get element * kpts * Only use trunc if periodic * Only use trunc if periodic * ADMM Keywords * Activate motion * Delete * runtype check * molecules * processor * dft set * multiplicity * Exchange correction function * pmg config * Test set * Basis and Potentials * pmg config * Molecule * kpts * Some tests * Test sets * Lintings * Test files and tests * Lint * More lint * Don't use built-in name * list->List, dict->Dict * Refactor pydantic to msonable * Delete settings.yaml * mypy 0.991 upgrade * Util types and lint * Uncomment a line * Missing arg * type * arg * Dftd3 Might as well add this * Don't use literal * Json A hack so that jsanitized objects still have int keys * Test for aux * Tiny fixes * rename some vars in pymatgen/cli/pmg_config.py to snake case rename single-letter vars (discouraged by flake8) * breaking: change CP2K _postprocessor parsing error type from OSError to ValueError complete pymatgen/io/cp2k/utils.py _(pre|post)processor doc strs add type anno * Pre/postprocessor * add return types in pymatgen/io/cp2k/sets.py required for mypy to analyze functions, found potential dict unpacking error in DftSet subclasses fix typos Co-authored-by: Janosh Riebesell <[email protected]>

Signed-off-by: lbluque <[email protected]>

shyuep · 2023-05-25T20:17:31Z

@lbluque NICE! Very useful for all structure manipulations! Merging as teh lint errors are unrelated to PR.

janosh · 2023-05-25T20:37:12Z

@stichri Since you did a great job on #2983 checking for memory leaks, would be hugely appreciated if you have the time to take these changes for a spin and let us know if you notice any issues. Always good to have another pair of eyes on performance critical code. 🙏

stichri · 2023-05-26T19:15:46Z

I quickly tried the new code and it indeed looks like there is no issue with leaking memory:

Just for curiosity's sake, I also quickly tried to look at it actually using valgrind - to that end, I:

realized the suppression file maintained for cpython really does not seem to suppress anything for me when running a script containing just a single pass statement with python3.11
therefore generated my own suppression file for everything that happens before calling find_points_in_spheres - by commenting out everything from line 15 on in the trial script as linked above and running
```
valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all --error-limit=no --gen-suppressions=all --log-file=all.log python ./memtest.py
```
followed by
```
cat ./all.log | ./parse_valgrind_suppressions.sh > all.supp
```
using parse_valgrind_suppressions.sh from here
got to see valgrind's output (of arguably false positives) for a single call of find_points_in_spheres - by uncommenting line 14 in the trial script as linked above and running
```
valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all --track-origins=yes --suppressions=all.supp python ./memtest.py
```

When the new code was compiled with debugging symbols, (as I've done by adding , extra_compile_args=['-ggdb3,-O0'] on line 173 of pymatgen's setup.py,) I am ultimately left with what seems to be 32 bytes in 2 blocks worth of arguably false positives:

==20882== Memcheck, a memory error detector
==20882== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==20882== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info
==20882== Command: python ./memtest.py
==20882==
==20882==
==20882== HEAP SUMMARY:
==20882==     in use at exit: 19,679,587 bytes in 149,555 blocks
==20882==   total heap usage: 1,146,146 allocs, 996,591 frees, 204,187,029 bytes allocated
==20882==
==20882== 16 bytes in 1 blocks are still reachable in loss record 120 of 21,505
==20882==    at 0x484182F: malloc (vg_replace_malloc.c:431)
==20882==    by 0x6259E62: npy_alloc_cache_dim (in /home/chrs/venv3.11-pmg-neighbors/lib/python3.11/site-packages/numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so)
==20882==    by 0x62B7AC0: PyArray_NewFromDescr_int (in /home/chrs/venv3.11-pmg-neighbors/lib/python3.11/site-packages/numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so)
==20882==    by 0x62BAA92: PyArray_NewLikeArrayWithShape (in /home/chrs/venv3.11-pmg-neighbors/lib/python3.11/site-packages/numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so)
==20882==    by 0x62BAD51: PyArray_FromArray (in /home/chrs/venv3.11-pmg-neighbors/lib/python3.11/site-packages/numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so)
==20882==    by 0x62BB473: PyArray_FromAny (in /home/chrs/venv3.11-pmg-neighbors/lib/python3.11/site-packages/numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so)
==20882==    by 0x62BB5E6: PyArray_CheckFromAny (in /home/chrs/venv3.11-pmg-neighbors/lib/python3.11/site-packages/numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so)
==20882==    by 0x63588A7: array_array (in /home/chrs/venv3.11-pmg-neighbors/lib/python3.11/site-packages/numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so)
==20882==    by 0x4A1ED89: cfunction_vectorcall_FASTCALL_KEYWORDS (methodobject.c:443)
==20882==    by 0x383FF5DF: __Pyx_PyObject_Call (neighbors.c:25731)
==20882==    by 0x383FF5DF: __Pyx__PyObject_CallOneArg (neighbors.c:25770)
==20882==    by 0x3841FF22: __pyx_pf_8pymatgen_12optimization_9neighbors_find_points_in_spheres.isra.0 (neighbors.c:6096)
==20882==    by 0x384239A5: __pyx_pw_8pymatgen_12optimization_9neighbors_1find_points_in_spheres (neighbors.c:3309)
==20882==
==20882== 16 bytes in 1 blocks are still reachable in loss record 121 of 21,505
==20882==    at 0x484182F: malloc (vg_replace_malloc.c:431)
==20882==    by 0x6259E62: npy_alloc_cache_dim (in /home/chrs/venv3.11-pmg-neighbors/lib/python3.11/site-packages/numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so)
==20882==    by 0x62B7AC0: PyArray_NewFromDescr_int (in /home/chrs/venv3.11-pmg-neighbors/lib/python3.11/site-packages/numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so)
==20882==    by 0x62BAA92: PyArray_NewLikeArrayWithShape (in /home/chrs/venv3.11-pmg-neighbors/lib/python3.11/site-packages/numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so)
==20882==    by 0x62BAD51: PyArray_FromArray (in /home/chrs/venv3.11-pmg-neighbors/lib/python3.11/site-packages/numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so)
==20882==    by 0x62BB473: PyArray_FromAny (in /home/chrs/venv3.11-pmg-neighbors/lib/python3.11/site-packages/numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so)
==20882==    by 0x62BB5E6: PyArray_CheckFromAny (in /home/chrs/venv3.11-pmg-neighbors/lib/python3.11/site-packages/numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so)
==20882==    by 0x63588A7: array_array (in /home/chrs/venv3.11-pmg-neighbors/lib/python3.11/site-packages/numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so)
==20882==    by 0x4A1ED89: cfunction_vectorcall_FASTCALL_KEYWORDS (methodobject.c:443)
==20882==    by 0x383FF5DF: __Pyx_PyObject_Call (neighbors.c:25731)
==20882==    by 0x383FF5DF: __Pyx__PyObject_CallOneArg (neighbors.c:25770)
==20882==    by 0x38420208: __pyx_pf_8pymatgen_12optimization_9neighbors_find_points_in_spheres.isra.0 (neighbors.c:6184)
==20882==    by 0x384239A5: __pyx_pw_8pymatgen_12optimization_9neighbors_1find_points_in_spheres (neighbors.c:3309)
==20882==
==20882== LEAK SUMMARY:
==20882==    definitely lost: 0 bytes in 0 blocks
==20882==    indirectly lost: 0 bytes in 0 blocks
==20882==      possibly lost: 0 bytes in 0 blocks
==20882==    still reachable: 32 bytes in 2 blocks
==20882==         suppressed: 19,679,555 bytes in 149,553 blocks
==20882==
==20882== For lists of detected and suppressed errors, rerun with: -s
==20882== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 16378 from 15300)

Note the reported offending code lines, e.g. neighbors.c:3309, refer to c code generated by cython which therefore is pretty much unreadable.
So in conclusion, I'm afraid this approach has limited utility without additional tooling (which I wouldn't be aware of) that allows to correlate the offending cython-generated code with the actual pyx-code as written by the user ...

There probably is a much better way that I'm unfortunately not aware of ... 😅

janosh · 2023-05-26T19:29:33Z

Damn, memory leak detection in Cython is a hell of a complex task. Thanks for taking the time and being so thorough!

If it's just 32 bytes out of 20 MiB, arguably we shouldn't sweat it. 😄 I'll just assume it's a timing issue of objects having gone out of scope but the garbage collector not having done a cleanup yet.

janosh and others added 28 commits May 25, 2023 12:51

pip install cython to fix "clang: error: no such file or directory: '…

6419cd2

…pymatgen/optimization/linear_assignment.c'" also remove continue-on-error: true from release step in test.yml

Move VolumetricData to io/common and merge cube support

e8bf8cc

Module level import

1c1a090

Delete sphere

691f7f9

Until we can be sure non-orthogonal lattices will work

Accidentally deleted

7d3bb1b

sprinkle nogils

781ad5e

remove python for loop

85d3005

avoid python overhead in get_cube_neighbors

23d5426

back to c malloc/free

6cd975b

pass pbc as int array

8bb3bf2

STY: use cdef context for improved readability

c8656ed

MAINT: use int type for index/count vars

216bb17

STY: organize cdef's

a60a1b0

MAINT: use array instead of pointers in signatures

d3b6954

MAINT: pass array in get_cube_neighbors

43b4dd2

MAINT: compute_offset not python interaction

e7b402d

MAINT: call realloc directly in loops and check failures afterwards

6195f1e

MAINT: remove num_threads arg for now

65c6024

MAINT: sprinkle const to arrays and memviews that should not be modified

7ceb6bf

MAINT: adapt structure/lattice to use updated find_points_in_spheres

5688173

STY: clean up readability

ccdf88c

BUG: set cdivision=False to avoid bug materialsproject#2226

6fe69e3

MAINT: declare missing ints

784cd65

DEV: remove cython compiler options

02bf53b

DEV: add comment about cdivision

3a6b448

STY: fix lint issues

bbe428f

Signed-off-by: lbluque <[email protected]>

lbluque force-pushed the neighbors branch from ada3051 to bbe428f Compare May 25, 2023 19:53

pre-commit auto-fixes

b680359

shyuep merged commit a237d19 into materialsproject:master May 25, 2023

janosh added enhancement A new feature or improvement to an existing one cython Performance critical Cython code labels May 25, 2023

lbluque mentioned this pull request Jun 26, 2023

BUG: force contiguous calling find_points_in_spheres #3108

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize cython find_points_in _spheres #3015

Optimize cython find_points_in _spheres #3015

lbluque commented May 25, 2023

shyuep commented May 25, 2023

janosh commented May 25, 2023

stichri commented May 26, 2023

janosh commented May 26, 2023

Optimize cython find_points_in _spheres #3015

Optimize cython find_points_in _spheres #3015

Conversation

lbluque commented May 25, 2023

Summary

Checklist

shyuep commented May 25, 2023

janosh commented May 25, 2023

stichri commented May 26, 2023

janosh commented May 26, 2023