Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory usage of KernelTransformer.transform and meta.utils.compute_kda_ma #676

Merged
merged 18 commits into from
May 26, 2022

Conversation

JulioAPeraza
Copy link
Collaborator

Closes None.

Changes proposed in this pull request:

  • Replace the 4D array in the (M)KDA with a list of 3D sparse arrays (one for each experiment) to avoid the high memory peak in the KernelTransformer.transform and meta.utils.compute_kda_ma.

There is probably a better way to introduce the sparse array here, but the current proposal works well with almost no modification to the kernel transformer.

@tsalo tsalo added the refactoring Requesting changes to the code which do not impact behavior label Apr 28, 2022
nimare/meta/utils.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented May 11, 2022

Codecov Report

Merging #676 (7d2fc93) into main (1831250) will increase coverage by 0.06%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #676      +/-   ##
==========================================
+ Coverage   85.25%   85.32%   +0.06%     
==========================================
  Files          41       41              
  Lines        4482     4503      +21     
==========================================
+ Hits         3821     3842      +21     
  Misses        661      661              
Impacted Files Coverage Δ
nimare/meta/cbma/base.py 95.63% <100.00%> (ø)
nimare/meta/cbma/mkda.py 97.09% <100.00%> (+0.12%) ⬆️
nimare/meta/kernel.py 80.12% <100.00%> (+0.12%) ⬆️
nimare/meta/utils.py 52.80% <100.00%> (+1.59%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1831250...7d2fc93. Read the comment docs.

@JulioAPeraza
Copy link
Collaborator Author

Profiler results:

Using dense arrays:

Dataset (n_studies) Peak Memory (MB) Time (s)
Small (10) 107.87 2
Medium (100) 913.96 3.3
Large (1000) 9064.67 27

Using sparse arrays:

Dataset (n_studies) Peak Memory (MB) Time (s)
Small (10) 64.13 3.6
Medium (100) 774.28 24
Large (1000) 3677.94 212

@JulioAPeraza JulioAPeraza marked this pull request as ready for review May 11, 2022 18:14
@tsalo
Copy link
Member

tsalo commented May 11, 2022

The change in peak memory for large datasets is promising, but I'm a little surprised by how much slower it ends up being.

@JulioAPeraza
Copy link
Collaborator Author

Yeah, It is also much slower than memory-mapped arrays:

Dataset (n_studies) Peak Memory (MB) Time (s)
Small (10) 52.74 2
Medium (100) 210.08 5
Large (1000) 2065.77 103

@JulioAPeraza
Copy link
Collaborator Author

@adelavega It looks like the compression performed by pydata/sparse takes a lot of time. An alternative approach could be to reshape the MRI data arrays and use scipy/sparse, which seems to be more efficient in terms of memory and computation time.

Copy link
Member

@adelavega adelavega left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to build up sparse arrays from coordinates directly rather than using numpy dense arrays as an intermediary.

When you construct a np.zeros arrray, you are still taking up the memory footprint of the size of that array. I think if we want to use sparse matrices, it's most efficient if we avoid dense arrays entirely.

This might also help w/ the computation time but no promises.
I would imagine that to convert a dense array to a sparse one you have to go through all zero values, which would add computation time (and would scale linearly w/ # of studies)

@@ -344,7 +347,8 @@ def compute_kda_ma(
# Use a memmapped 4D array
kernel_data = np.memmap(memmap_filename, dtype=type(value), mode="w+", shape=kernel_shape)
else:
kernel_data = np.zeros(kernel_shape, dtype=type(value))
# Use a list of 3D sparse arrays
kernel_data_lst = [None] * n_studies
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading correctly the idea here it to use a list of 3D sparse arrays?

Isn't it possible with sparse to construct a 4D array? In a way I believe that's one of the main advantages of sparse

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading correctly the idea here it to use a list of 3D sparse arrays?

Yes, the list of 3D sparse arrays follows the same principle as it was previously implemented. The only advantage is that we avoid the memory peak by accumulating sparse arrays instead of the whole 3D array. But the dense to sparse and sparse to dense formatting is slowing everything.

Isn't it possible with sparse to construct a 4D array? In a way I believe that's one of the main advantages of sparse

Yes, I think it is possible.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to construct the 4D sparse array from coordinates, but this object type sparse._coo.core.COO is not iterable. To get this working we would need to modify the KernelTrnasformer. Alternatively, we could try again the list of 3D sparse arrays but build the sparse array from coordinates instead of compressing a dense array.

# Write changes to disk
kernel_data.flush()
if kernel_data_lst[exp] is None:
kernel_data = np.zeros(shape, dtype=type(value))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to create a zeros array, as this will take up the memory footprint of shape in memory, even if its all zeros.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. It is only 7MB for the peak memory, but it will be better to start using sparse arrays from coordinates directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, it's probably optimized to not take up as much memory as it could when empty


if squeeze:
kernel_data = np.squeeze(kernel_data, axis=0)
kernel_data_lst[exp] = sparse._compressed.GCXS.from_numpy(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kernel_data is a dense_matrix composed of zeros at this point right?

I think those use sparse matrices correctly we want to avoid dense matrices at any point, and instead build a sprase matrix from the bottom up.

See: https://sparse.pydata.org/en/stable/construct.html

I think we want to create from coordinates and data, rather than from an empty numpy matrix. This should save memory & processing time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. I agree with this. Now, I can see how to implement it.
I think in the KernelTrnasformer we do need to convert the sparse arrays to dense since nib.Nifti1Image() expects a dense array.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, yes, I think that's correct, although at that point we will have one less dimension (i.e. studies) so the dense array should have a much smaller footprint.

Hopefully this speeds things up. If not I'm OK to try scipy.sparse also

@adelavega
Copy link
Member

@tsalo @jdkent will you double check my review?

@JulioAPeraza
Copy link
Collaborator Author

I modified the kernel transformer to index the MA maps in the 4D sparse array instead of iterating over it. The new changes are working, but I get a huge memory peak while and a high computation time. It seems that indexing the sparse array takes time and memory. I will try creating a list of 3D sparse arrays to avoid indexing.

@JulioAPeraza
Copy link
Collaborator Author

JulioAPeraza commented May 17, 2022

Deleting these two variables (https://github.com/neurostuff/NiMARE/blob/main/nimare/meta/cbma/mkda.py#L334, https://github.com/neurostuff/NiMARE/blob/main/nimare/meta/cbma/mkda.py#L351) using del does not free up the memory. This is causing a high memory peak when the MA map of the second dataset is generated.
Calling the Garbage Collector (gc.collect()) after each del solved the issue.

EDIT: I only observed the behavior while testing the 4D Sparse Array in the large dataset. That was the "huge memory peak" I was referring to in my previous comment.

@JulioAPeraza
Copy link
Collaborator Author

Latest profiler results:

Method Dataset (n_studies) Peak Memory (MB) Time (s)
Small (10) 108 2
4D Dense Array Medium (100) 914 3
(Current Default) Large (1000) 9,065 33
Very Large (10000) 90,571 1,166
Small (10) 53 2
4D Memory-Mapped Array Medium (100) 210 5
Large (1000) 2,065 106
Very Large (10000) 20,621 2,139
Small (10) 66 11
4D Sparse Array Medium (100) 382 14
(Proposed Default) Large (1000) 3,679 48
Very Large (10000) 36,649 993
Small (10) 52 3
List of 3D Sprse Arrays Medium (100) 369 7
Large (1000) 3,666 67
Very Large (10000) 36,635 1,177

@adelavega
Copy link
Member

Nice.Those numbers look more reasonable to me. I didn't see why 3D vs 4D sparse should be any different unless there was a bug. Since 3D and 4D sparse seem equivalent, I would only keep one (4D) to keep the code simple.

I would say thought that this is and of itself is a satisfactory improvement in memory and warrants removing the memmap approach since its not that much better, but is much slower. Are we doing this in a separate PR?

I'm still a little bit surprised we can't get this down a bit more though.
What is the object size of the 4D object? For example for 1000 studies, what's the size of just the sparse array?
That would help to know if there's other places where we are using a lot of memory.

@JulioAPeraza
Copy link
Collaborator Author

I think @jdkent suggested removing the memmap approach in a separate PR.

For 1000 studies, for example, the size of the sparse array is ~500MB. However, we need to transform experiment-wise sparse arrays (transformed_maps[0][idx]) to dense to create the nib.Nifti1Image object (nib.Nifti1Image expects a numpy array), and also to mask the data (n-dimensional indexing is not supported in pydata/sparse). So, I think we end up accumulating a lot of memory in the variable imgs (~2MB per image = 2GB for 1000 studies) after each loop, plus the 500MB of the sparse array.
One solution could be to return the 4D sparse array directly and perform the sum here:

https://github.com/JulioAPeraza/NiMARE/blob/enh-sparse/nimare/meta/cbma/mkda.py#L327

https://github.com/JulioAPeraza/NiMARE/blob/enh-sparse/nimare/meta/cbma/mkda.py#L345

using the sparse array sum operation which I think is supported: https://sparse.pydata.org/en/stable/operations.html#reductions

@JulioAPeraza
Copy link
Collaborator Author

Returning the sparse array by the transformer and using the sparse array in the mkda fit shows a considerable improvement in the memory peak:

Method Dataset (n_studies) Peak Memory (MB) Time (s)
Small (10) 64 5
4D Sparse Array Medium (100) 172 9
(Use sparse in mkda) Large (1000) 1,607 60
Very Large (10000) 15,947 1,117

The computation time increased because I had to run np.unique to the coordinates for the COO.sum(axis=0) to work well.

@JulioAPeraza
Copy link
Collaborator Author

The code has gotten a little messy trying to support both sparse and memap arrays.
I think it would be better to just drop memory-mapping arrays in this PR. Thoughts?

@adelavega
Copy link
Member

adelavega commented May 21, 2022

Nice! That's a huge improvement now. That's the difference between running on your laptop and now.

I think the performance hit is acceptable, but in the future we would want to think about ways to bring that down, since its already a bit of a bottleneck (especially for Correctors). We can potentially tackle that in a separate PR though.

I support dropping memmap. If you wanted to keep it clean you could start a new PR to drop memmap, confirm tests pass etc, we can merge that to master then you can rebase your current PR with those changes.

I assuming dropping memmap would be fairly easy so let's try that approach if that works for you.

@JulioAPeraza
Copy link
Collaborator Author

I'm profiling the CorrelationDecoder now to see if we get any improvement there. Is there anything else we should include in this PR?

@@ -383,6 +401,8 @@ def _fit(self, dataset1, dataset2):

del n_selected, n_unselected

# print(cells)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# print(cells)

setup.cfg Outdated
numpy
pandas
pymare>=0.0.3rc2
requests
scikit-learn
scipy
sparse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add version

setup.cfg Outdated Show resolved Hide resolved
if not sum_overlap:
coords = np.unique(coords, axis=1)
data = np.full(coords.shape[1], value)
kernel_data = sparse.COO(coords, data, shape=kernel_shape)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we want to make it parameterizable if sparse or dense arrays are used?

cc: @tsalo

@adelavega
Copy link
Member

No, I think this PR looks good! I'd like to give @tsalo a chance to review though.

Copy link
Member

@tsalo tsalo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few small requests, but overall I think this is great. Thanks!

setup.cfg Outdated Show resolved Hide resolved
setup.cfg Outdated Show resolved Hide resolved
@JulioAPeraza
Copy link
Collaborator Author

I have added the suggestions. Thanks for the review!

@adelavega
Copy link
Member

Awesome @JulioAPeraza

I'm going to go ahead and merge, it looks good.

@adelavega adelavega merged commit 744bc0d into neurostuff:main May 26, 2022
@JulioAPeraza JulioAPeraza deleted the enh-sparse branch May 31, 2022 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring Requesting changes to the code which do not impact behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants