-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory usage of KernelTransformer.transform
and meta.utils.compute_kda_ma
#676
Conversation
Codecov Report
@@ Coverage Diff @@
## main #676 +/- ##
==========================================
+ Coverage 85.25% 85.32% +0.06%
==========================================
Files 41 41
Lines 4482 4503 +21
==========================================
+ Hits 3821 3842 +21
Misses 661 661
Continue to review full report at Codecov.
|
Profiler results:Using dense arrays:
Using sparse arrays:
|
The change in peak memory for large datasets is promising, but I'm a little surprised by how much slower it ends up being. |
Yeah, It is also much slower than memory-mapped arrays:
|
@adelavega It looks like the compression performed by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to build up sparse arrays from coordinates directly rather than using numpy dense arrays as an intermediary.
When you construct a np.zeros
arrray, you are still taking up the memory footprint of the size of that array. I think if we want to use sparse matrices, it's most efficient if we avoid dense arrays entirely.
This might also help w/ the computation time but no promises.
I would imagine that to convert a dense array to a sparse one you have to go through all zero values, which would add computation time (and would scale linearly w/ # of studies)
nimare/meta/utils.py
Outdated
@@ -344,7 +347,8 @@ def compute_kda_ma( | |||
# Use a memmapped 4D array | |||
kernel_data = np.memmap(memmap_filename, dtype=type(value), mode="w+", shape=kernel_shape) | |||
else: | |||
kernel_data = np.zeros(kernel_shape, dtype=type(value)) | |||
# Use a list of 3D sparse arrays | |||
kernel_data_lst = [None] * n_studies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading correctly the idea here it to use a list of 3D sparse arrays?
Isn't it possible with sparse
to construct a 4D array? In a way I believe that's one of the main advantages of sparse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading correctly the idea here it to use a list of 3D sparse arrays?
Yes, the list of 3D sparse arrays follows the same principle as it was previously implemented. The only advantage is that we avoid the memory peak by accumulating sparse arrays instead of the whole 3D array. But the dense to sparse and sparse to dense formatting is slowing everything.
Isn't it possible with sparse to construct a 4D array? In a way I believe that's one of the main advantages of sparse
Yes, I think it is possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to construct the 4D sparse array from coordinates, but this object type sparse._coo.core.COO
is not iterable. To get this working we would need to modify the KernelTrnasformer
. Alternatively, we could try again the list of 3D sparse arrays but build the sparse array from coordinates instead of compressing a dense array.
nimare/meta/utils.py
Outdated
# Write changes to disk | ||
kernel_data.flush() | ||
if kernel_data_lst[exp] is None: | ||
kernel_data = np.zeros(shape, dtype=type(value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to create a zeros array, as this will take up the memory footprint of shape
in memory, even if its all zeros.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. It is only 7MB for the peak memory, but it will be better to start using sparse arrays from coordinates directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay, it's probably optimized to not take up as much memory as it could when empty
nimare/meta/utils.py
Outdated
|
||
if squeeze: | ||
kernel_data = np.squeeze(kernel_data, axis=0) | ||
kernel_data_lst[exp] = sparse._compressed.GCXS.from_numpy( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kernel_data
is a dense_matrix composed of zeros at this point right?
I think those use sparse
matrices correctly we want to avoid dense matrices at any point, and instead build a sprase matrix from the bottom up.
See: https://sparse.pydata.org/en/stable/construct.html
I think we want to create from coordinates and data, rather than from an empty numpy matrix. This should save memory & processing time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review. I agree with this. Now, I can see how to implement it.
I think in the KernelTrnasformer
we do need to convert the sparse arrays to dense since nib.Nifti1Image()
expects a dense array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, yes, I think that's correct, although at that point we will have one less dimension (i.e. studies) so the dense array should have a much smaller footprint.
Hopefully this speeds things up. If not I'm OK to try scipy.sparse also
I modified the kernel transformer to index the MA maps in the 4D sparse array instead of iterating over it. The new changes are working, but I get a huge memory peak while and a high computation time. It seems that indexing the sparse array takes time and memory. I will try creating a list of 3D sparse arrays to avoid indexing. |
Deleting these two variables (https://github.com/neurostuff/NiMARE/blob/main/nimare/meta/cbma/mkda.py#L334, https://github.com/neurostuff/NiMARE/blob/main/nimare/meta/cbma/mkda.py#L351) using EDIT: I only observed the behavior while testing the 4D Sparse Array in the large dataset. That was the "huge memory peak" I was referring to in my previous comment. |
Latest profiler results:
|
Nice.Those numbers look more reasonable to me. I didn't see why 3D vs 4D sparse should be any different unless there was a bug. Since 3D and 4D sparse seem equivalent, I would only keep one (4D) to keep the code simple. I would say thought that this is and of itself is a satisfactory improvement in memory and warrants removing the memmap approach since its not that much better, but is much slower. Are we doing this in a separate PR? I'm still a little bit surprised we can't get this down a bit more though. |
I think @jdkent suggested removing the For 1000 studies, for example, the size of the sparse array is ~500MB. However, we need to transform experiment-wise sparse arrays ( https://github.com/JulioAPeraza/NiMARE/blob/enh-sparse/nimare/meta/cbma/mkda.py#L327 https://github.com/JulioAPeraza/NiMARE/blob/enh-sparse/nimare/meta/cbma/mkda.py#L345 using the sparse array sum operation which I think is supported: https://sparse.pydata.org/en/stable/operations.html#reductions |
Returning the sparse array by the transformer and using the sparse array in the
The computation time increased because I had to run |
The code has gotten a little messy trying to support both sparse and |
Nice! That's a huge improvement now. That's the difference between running on your laptop and now. I think the performance hit is acceptable, but in the future we would want to think about ways to bring that down, since its already a bit of a bottleneck (especially for I support dropping memmap. If you wanted to keep it clean you could start a new PR to drop memmap, confirm tests pass etc, we can merge that to master then you can rebase your current PR with those changes. I assuming dropping memmap would be fairly easy so let's try that approach if that works for you. |
I'm profiling the CorrelationDecoder now to see if we get any improvement there. Is there anything else we should include in this PR? |
nimare/meta/cbma/mkda.py
Outdated
@@ -383,6 +401,8 @@ def _fit(self, dataset1, dataset2): | |||
|
|||
del n_selected, n_unselected | |||
|
|||
# print(cells) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# print(cells) |
setup.cfg
Outdated
numpy | ||
pandas | ||
pymare>=0.0.3rc2 | ||
requests | ||
scikit-learn | ||
scipy | ||
sparse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add version
if not sum_overlap: | ||
coords = np.unique(coords, axis=1) | ||
data = np.full(coords.shape[1], value) | ||
kernel_data = sparse.COO(coords, data, shape=kernel_shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we want to make it parameterizable if sparse or dense arrays are used?
cc: @tsalo
No, I think this PR looks good! I'd like to give @tsalo a chance to review though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few small requests, but overall I think this is great. Thanks!
Co-authored-by: Alejandro de la Vega <[email protected]> Co-authored-by: Taylor Salo <[email protected]>
I have added the suggestions. Thanks for the review! |
Awesome @JulioAPeraza I'm going to go ahead and merge, it looks good. |
Closes None.
Changes proposed in this pull request:
KernelTransformer.transform
andmeta.utils.compute_kda_ma
.There is probably a better way to introduce the sparse array here, but the current proposal works well with almost no modification to the kernel transformer.