Reduce memory usage of `KernelTransformer.transform` and `meta.utils.compute_kda_ma` #676

JulioAPeraza · 2022-04-28T14:12:36Z

Closes None.

Changes proposed in this pull request:

Replace the 4D array in the (M)KDA with a list of 3D sparse arrays (one for each experiment) to avoid the high memory peak in the KernelTransformer.transform and meta.utils.compute_kda_ma.

There is probably a better way to introduce the sparse array here, but the current proposal works well with almost no modification to the kernel transformer.

nimare/meta/utils.py

codecov · 2022-05-11T15:25:57Z

Codecov Report

Merging #676 (7d2fc93) into main (1831250) will increase coverage by 0.06%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #676      +/-   ##
==========================================
+ Coverage   85.25%   85.32%   +0.06%     
==========================================
  Files          41       41              
  Lines        4482     4503      +21     
==========================================
+ Hits         3821     3842      +21     
  Misses        661      661

Impacted Files	Coverage Δ
nimare/meta/cbma/base.py	`95.63% <100.00%> (ø)`
nimare/meta/cbma/mkda.py	`97.09% <100.00%> (+0.12%)`	⬆️
nimare/meta/kernel.py	`80.12% <100.00%> (+0.12%)`	⬆️
nimare/meta/utils.py	`52.80% <100.00%> (+1.59%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1831250...7d2fc93. Read the comment docs.

JulioAPeraza · 2022-05-11T15:57:21Z

Profiler results:

Using dense arrays:

Dataset (`n_studies`)	Peak Memory (MB)	Time (s)
Small (10)	107.87	2
Medium (100)	913.96	3.3
Large (1000)	9064.67	27

Using sparse arrays:

Dataset (`n_studies`)	Peak Memory (MB)	Time (s)
Small (10)	64.13	3.6
Medium (100)	774.28	24
Large (1000)	3677.94	212

tsalo · 2022-05-11T18:37:46Z

The change in peak memory for large datasets is promising, but I'm a little surprised by how much slower it ends up being.

JulioAPeraza · 2022-05-11T21:51:12Z

Yeah, It is also much slower than memory-mapped arrays:

Dataset (`n_studies`)	Peak Memory (MB)	Time (s)
Small (10)	52.74	2
Medium (100)	210.08	5
Large (1000)	2065.77	103

JulioAPeraza · 2022-05-12T18:51:16Z

@adelavega It looks like the compression performed by pydata/sparse takes a lot of time. An alternative approach could be to reshape the MRI data arrays and use scipy/sparse, which seems to be more efficient in terms of memory and computation time.

adelavega

I think we need to build up sparse arrays from coordinates directly rather than using numpy dense arrays as an intermediary.

When you construct a np.zeros arrray, you are still taking up the memory footprint of the size of that array. I think if we want to use sparse matrices, it's most efficient if we avoid dense arrays entirely.

This might also help w/ the computation time but no promises.
I would imagine that to convert a dense array to a sparse one you have to go through all zero values, which would add computation time (and would scale linearly w/ # of studies)

adelavega · 2022-05-12T21:24:20Z

nimare/meta/utils.py

@@ -344,7 +347,8 @@ def compute_kda_ma(
        # Use a memmapped 4D array
        kernel_data = np.memmap(memmap_filename, dtype=type(value), mode="w+", shape=kernel_shape)
    else:
-        kernel_data = np.zeros(kernel_shape, dtype=type(value))
+        # Use a list of 3D sparse arrays
+        kernel_data_lst = [None] * n_studies


If I'm reading correctly the idea here it to use a list of 3D sparse arrays?

Isn't it possible with sparse to construct a 4D array? In a way I believe that's one of the main advantages of sparse

If I'm reading correctly the idea here it to use a list of 3D sparse arrays?

Yes, the list of 3D sparse arrays follows the same principle as it was previously implemented. The only advantage is that we avoid the memory peak by accumulating sparse arrays instead of the whole 3D array. But the dense to sparse and sparse to dense formatting is slowing everything.

Isn't it possible with sparse to construct a 4D array? In a way I believe that's one of the main advantages of sparse

Yes, I think it is possible.

I was able to construct the 4D sparse array from coordinates, but this object type sparse._coo.core.COO is not iterable. To get this working we would need to modify the KernelTrnasformer. Alternatively, we could try again the list of 3D sparse arrays but build the sparse array from coordinates instead of compressing a dense array.

adelavega · 2022-05-12T21:24:59Z

nimare/meta/utils.py

-            # Write changes to disk
-            kernel_data.flush()
+            if kernel_data_lst[exp] is None:
+                kernel_data = np.zeros(shape, dtype=type(value))


I don't think we want to create a zeros array, as this will take up the memory footprint of shape in memory, even if its all zeros.

Agreed. It is only 7MB for the peak memory, but it will be better to start using sparse arrays from coordinates directly.

Ah okay, it's probably optimized to not take up as much memory as it could when empty

adelavega · 2022-05-12T21:27:22Z

nimare/meta/utils.py


-    if squeeze:
-        kernel_data = np.squeeze(kernel_data, axis=0)
+            kernel_data_lst[exp] = sparse._compressed.GCXS.from_numpy(


kernel_data is a dense_matrix composed of zeros at this point right?

I think those use sparse matrices correctly we want to avoid dense matrices at any point, and instead build a sprase matrix from the bottom up.

See: https://sparse.pydata.org/en/stable/construct.html

I think we want to create from coordinates and data, rather than from an empty numpy matrix. This should save memory & processing time.

Thanks for the review. I agree with this. Now, I can see how to implement it.
I think in the KernelTrnasformer we do need to convert the sparse arrays to dense since nib.Nifti1Image() expects a dense array.

Okay, yes, I think that's correct, although at that point we will have one less dimension (i.e. studies) so the dense array should have a much smaller footprint.

Hopefully this speeds things up. If not I'm OK to try scipy.sparse also

adelavega · 2022-05-12T21:30:31Z

@tsalo @jdkent will you double check my review?

JulioAPeraza · 2022-05-16T17:30:33Z

I modified the kernel transformer to index the MA maps in the 4D sparse array instead of iterating over it. The new changes are working, but I get a huge memory peak while and a high computation time. It seems that indexing the sparse array takes time and memory. I will try creating a list of 3D sparse arrays to avoid indexing.

JulioAPeraza · 2022-05-17T22:53:54Z

Deleting these two variables (https://github.com/neurostuff/NiMARE/blob/main/nimare/meta/cbma/mkda.py#L334, https://github.com/neurostuff/NiMARE/blob/main/nimare/meta/cbma/mkda.py#L351) using del does not free up the memory. This is causing a high memory peak when the MA map of the second dataset is generated.
Calling the Garbage Collector (gc.collect()) after each del solved the issue.

EDIT: I only observed the behavior while testing the 4D Sparse Array in the large dataset. That was the "huge memory peak" I was referring to in my previous comment.

JulioAPeraza · 2022-05-17T23:14:48Z

Latest profiler results:

Method	Dataset (`n_studies`)	Peak Memory (MB)	Time (s)
	Small (10)	108	2
4D Dense Array	Medium (100)	914	3
(Current Default)	Large (1000)	9,065	33
	Very Large (10000)	90,571	1,166

	Small (10)	53	2
4D Memory-Mapped Array	Medium (100)	210	5
	Large (1000)	2,065	106
	Very Large (10000)	20,621	2,139

	Small (10)	66	11
4D Sparse Array	Medium (100)	382	14
(Proposed Default)	Large (1000)	3,679	48
	Very Large (10000)	36,649	993

	Small (10)	52	3
List of 3D Sprse Arrays	Medium (100)	369	7
	Large (1000)	3,666	67
	Very Large (10000)	36,635	1,177

adelavega · 2022-05-18T19:53:50Z

Nice.Those numbers look more reasonable to me. I didn't see why 3D vs 4D sparse should be any different unless there was a bug. Since 3D and 4D sparse seem equivalent, I would only keep one (4D) to keep the code simple.

I would say thought that this is and of itself is a satisfactory improvement in memory and warrants removing the memmap approach since its not that much better, but is much slower. Are we doing this in a separate PR?

I'm still a little bit surprised we can't get this down a bit more though.
What is the object size of the 4D object? For example for 1000 studies, what's the size of just the sparse array?
That would help to know if there's other places where we are using a lot of memory.

JulioAPeraza · 2022-05-18T23:34:07Z

I think @jdkent suggested removing the memmap approach in a separate PR.

For 1000 studies, for example, the size of the sparse array is ~500MB. However, we need to transform experiment-wise sparse arrays (transformed_maps[0][idx]) to dense to create the nib.Nifti1Image object (nib.Nifti1Image expects a numpy array), and also to mask the data (n-dimensional indexing is not supported in pydata/sparse). So, I think we end up accumulating a lot of memory in the variable imgs (~2MB per image = 2GB for 1000 studies) after each loop, plus the 500MB of the sparse array.
One solution could be to return the 4D sparse array directly and perform the sum here:

https://github.com/JulioAPeraza/NiMARE/blob/enh-sparse/nimare/meta/cbma/mkda.py#L327

https://github.com/JulioAPeraza/NiMARE/blob/enh-sparse/nimare/meta/cbma/mkda.py#L345

using the sparse array sum operation which I think is supported: https://sparse.pydata.org/en/stable/operations.html#reductions

JulioAPeraza · 2022-05-20T21:35:24Z

Returning the sparse array by the transformer and using the sparse array in the mkda fit shows a considerable improvement in the memory peak:

Method	Dataset (`n_studies`)	Peak Memory (MB)	Time (s)
	Small (10)	64	5
4D Sparse Array	Medium (100)	172	9
(Use sparse in `mkda`)	Large (1000)	1,607	60
	Very Large (10000)	15,947	1,117

The computation time increased because I had to run np.unique to the coordinates for the COO.sum(axis=0) to work well.

JulioAPeraza · 2022-05-20T23:04:43Z

The code has gotten a little messy trying to support both sparse and memap arrays.
I think it would be better to just drop memory-mapping arrays in this PR. Thoughts?

adelavega · 2022-05-21T01:04:16Z

Nice! That's a huge improvement now. That's the difference between running on your laptop and now.

I think the performance hit is acceptable, but in the future we would want to think about ways to bring that down, since its already a bit of a bottleneck (especially for Correctors). We can potentially tackle that in a separate PR though.

I support dropping memmap. If you wanted to keep it clean you could start a new PR to drop memmap, confirm tests pass etc, we can merge that to master then you can rebase your current PR with those changes.

I assuming dropping memmap would be fairly easy so let's try that approach if that works for you.

JulioAPeraza · 2022-05-25T20:04:30Z

I'm profiling the CorrelationDecoder now to see if we get any improvement there. Is there anything else we should include in this PR?

adelavega · 2022-05-25T22:41:32Z

nimare/meta/cbma/mkda.py

@@ -383,6 +401,8 @@ def _fit(self, dataset1, dataset2):

        del n_selected, n_unselected

+        # print(cells)


Suggested change

# print(cells)

adelavega · 2022-05-25T22:43:21Z

setup.cfg

    numpy
    pandas
    pymare>=0.0.3rc2
    requests
    scikit-learn
    scipy
+    sparse


add version

setup.cfg

adelavega · 2022-05-25T22:45:22Z

nimare/meta/utils.py

+    if not sum_overlap:
+        coords = np.unique(coords, axis=1)
+    data = np.full(coords.shape[1], value)
+    kernel_data = sparse.COO(coords, data, shape=kernel_shape)


Would we want to make it parameterizable if sparse or dense arrays are used?

cc: @tsalo

adelavega · 2022-05-25T22:46:56Z

No, I think this PR looks good! I'd like to give @tsalo a chance to review though.

tsalo

I have a few small requests, but overall I think this is great. Thanks!

setup.cfg

Co-authored-by: Alejandro de la Vega <[email protected]> Co-authored-by: Taylor Salo <[email protected]>

JulioAPeraza · 2022-05-26T17:25:00Z

I have added the suggestions. Thanks for the review!

adelavega · 2022-05-26T20:12:40Z

Awesome @JulioAPeraza

I'm going to go ahead and merge, it looks good.

Replace 4D array with a list of 3D sparse arrays

6baad04

tsalo added the refactoring Requesting changes to the code which do not impact behavior label Apr 28, 2022

tsalo reviewed May 5, 2022

View reviewed changes

nimare/meta/utils.py Outdated Show resolved Hide resolved

JulioAPeraza added 3 commits May 10, 2022 18:08

convert to dense only when type is sparse._compressed.compressed.GCXS

4ae5ae7

convert sparse array to dense in the testing script too

bd4a104

Fix: tests are passing locally

d2d9b61

JulioAPeraza and others added 2 commits May 11, 2022 12:43

Merge branch 'neurostuff:main' into enh-sparse

c9a945a

add the new returns to docstring

6ba1576

JulioAPeraza marked this pull request as ready for review May 11, 2022 18:14

JulioAPeraza added 2 commits May 11, 2022 14:17

fix whitespace

1b101bc

another whitespace

36482ea

adelavega requested changes May 12, 2022

View reviewed changes

build up 4D sparse arrays from coordinates directly

69abe47

JulioAPeraza added 2 commits May 17, 2022 19:02

add gc after del

951c4c1

remove list(range(...)), np.arange is faster

e6575bd

return and work with the 4D sparse array

c7513b0

JulioAPeraza mentioned this pull request May 21, 2022

Dropping the memory-mapping option for Estimators and kernel transformers #681

Merged

JulioAPeraza and others added 3 commits May 23, 2022 16:09

Merge branch 'main' into enh-sparse

d322c0f

add return_type

e85dfb2

organize kernel transformer

017af5c

adelavega reviewed May 25, 2022

View reviewed changes

setup.cfg Outdated Show resolved Hide resolved

adelavega reviewed May 25, 2022

View reviewed changes

adding @adelavega review

a80547d

tsalo requested changes May 26, 2022

View reviewed changes

setup.cfg Outdated Show resolved Hide resolved

setup.cfg Outdated Show resolved Hide resolved

JulioAPeraza and others added 2 commits May 26, 2022 12:02

Apply suggestions from code review

709bbb5

Co-authored-by: Alejandro de la Vega <[email protected]> Co-authored-by: Taylor Salo <[email protected]>

Merge branch 'neurostuff:main' into enh-sparse

7d2fc93

adelavega approved these changes May 26, 2022

View reviewed changes

adelavega merged commit 744bc0d into neurostuff:main May 26, 2022

This was referenced May 26, 2022

Reduce memory usage of CBMAEstimator's FWE correction #659

Closed

Remove extraneous copy() statements #662

Merged

JulioAPeraza deleted the enh-sparse branch May 31, 2022 21:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory usage of `KernelTransformer.transform` and `meta.utils.compute_kda_ma` #676

Reduce memory usage of `KernelTransformer.transform` and `meta.utils.compute_kda_ma` #676

JulioAPeraza commented Apr 28, 2022

codecov bot commented May 11, 2022 •

edited

Loading

JulioAPeraza commented May 11, 2022

tsalo commented May 11, 2022

JulioAPeraza commented May 11, 2022

JulioAPeraza commented May 12, 2022

adelavega left a comment

adelavega May 12, 2022

JulioAPeraza May 12, 2022

JulioAPeraza May 13, 2022

adelavega May 12, 2022

JulioAPeraza May 12, 2022

adelavega May 12, 2022

adelavega May 12, 2022

JulioAPeraza May 12, 2022

adelavega May 12, 2022

adelavega commented May 12, 2022

JulioAPeraza commented May 16, 2022

JulioAPeraza commented May 17, 2022 •

edited

Loading

JulioAPeraza commented May 17, 2022

adelavega commented May 18, 2022

JulioAPeraza commented May 18, 2022

JulioAPeraza commented May 20, 2022

JulioAPeraza commented May 20, 2022

adelavega commented May 21, 2022 •

edited

Loading

JulioAPeraza commented May 25, 2022

adelavega May 25, 2022

adelavega May 25, 2022

adelavega May 25, 2022

adelavega commented May 25, 2022

tsalo left a comment

JulioAPeraza commented May 26, 2022

adelavega commented May 26, 2022

		@@ -383,6 +401,8 @@ def _fit(self, dataset1, dataset2):

		del n_selected, n_unselected

		# print(cells)

Reduce memory usage of KernelTransformer.transform and meta.utils.compute_kda_ma #676

Reduce memory usage of KernelTransformer.transform and meta.utils.compute_kda_ma #676

Conversation

JulioAPeraza commented Apr 28, 2022

codecov bot commented May 11, 2022 • edited Loading

Codecov Report

JulioAPeraza commented May 11, 2022

Profiler results:

Using dense arrays:

Using sparse arrays:

tsalo commented May 11, 2022

JulioAPeraza commented May 11, 2022

JulioAPeraza commented May 12, 2022

adelavega left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adelavega commented May 12, 2022

JulioAPeraza commented May 16, 2022

JulioAPeraza commented May 17, 2022 • edited Loading

JulioAPeraza commented May 17, 2022

adelavega commented May 18, 2022

JulioAPeraza commented May 18, 2022

JulioAPeraza commented May 20, 2022

JulioAPeraza commented May 20, 2022

adelavega commented May 21, 2022 • edited Loading

JulioAPeraza commented May 25, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adelavega commented May 25, 2022

tsalo left a comment

Choose a reason for hiding this comment

JulioAPeraza commented May 26, 2022

adelavega commented May 26, 2022

Reduce memory usage of `KernelTransformer.transform` and `meta.utils.compute_kda_ma` #676

Reduce memory usage of `KernelTransformer.transform` and `meta.utils.compute_kda_ma` #676

codecov bot commented May 11, 2022 •

edited

Loading

JulioAPeraza commented May 17, 2022 •

edited

Loading

adelavega commented May 21, 2022 •

edited

Loading