Optimize numpy operations in MKDADensity Estimator and (M)KDAKernel #685

adelavega · 2022-05-28T00:01:17Z

Speeds in KDA meta-analysis, especially in the context of monte-carlo correction
Main gains came from optimizing numpy operations, especially switchingout np.unique -> unique_rows
Also chunked kernel operations (i.e. convolution & masking), by experiment ID, which is more efficient

Profile summary: Around a ~2.5-3x speed up for correct_fwe_montecarlo on a dataset with ~1000 coordinates

welcome · 2022-05-28T00:01:19Z

Thanks for opening this pull request! We have detected this is the first time you have contributed to NiMARE. Please check out our contributing guidelines.
We invite you to list yourself as a NiMARE contributor, so if your name is not already mentioned, please modify the .zenodo.json file with your data right above Angie's entry. Example:

{
  "name": "Contributor, New",
  "affiliation": "Department of Psychology, Some University",
  "orcid": "<your id>"
},
{
  "name": "Laird, Angela R.",
  "affiliation": "Florida International University",
  "orcid": "0000-0003-3379-8744"
},

Of course, if you want to opt out this time there is no problem at all with adding your name later. You will be always welcome to add it in the future whenever you feel it should be listed.

adelavega · 2022-05-28T01:20:07Z

From this issue: numpy/numpy#11136
I found an implementation that uses pandas and hashing to get unique rows wihout sorting for 2D arrays.

It seems to be significantly faster. All in all, for a dataset w/ ~100 studies, it went from ~0.53 to ~0.13s., possibily shaving off ~0.4s / 1.5s for the _correct_fwe_montecarlo_permutation function overall.

adelavega · 2022-05-28T01:32:29Z

Replaced with scikit image version: https://github.com/scikit-image/scikit-image/blob/64103e6c90917fcfdef8343fd7dd4df32c910446/skimage/util/unique.py#L47

Down to 0.962869 s from ~1.6 on the last comparison

adelavega · 2022-05-31T20:31:34Z

Profiling results using the "neurosynth_laird_studies.json" dataset (1117 coordinates in 17 studies).

correct_fwe_montecarlo

n_iter	`main`	`speed`
100	88 s	35.2s

It's looking like a 2.5x+ speed up, just optimizing compute_kda_ma

Update, after a few more tweaks:

n_iter	`main`	`speed`
500	441s	150s

~2.94x speed up

codecov · 2022-05-31T20:55:08Z

Codecov Report

Merging #685 (20504fc) into main (7555c90) will increase coverage by 0.02%.
The diff coverage is 97.14%.

@@            Coverage Diff             @@
##             main     #685      +/-   ##
==========================================
+ Coverage   85.28%   85.31%   +0.02%     
==========================================
  Files          41       41              
  Lines        4507     4521      +14     
==========================================
+ Hits         3844     3857      +13     
- Misses        663      664       +1

Impacted Files	Coverage Δ
nimare/utils.py	`94.52% <87.50%> (-0.17%)`	⬇️
nimare/meta/cbma/mkda.py	`97.08% <100.00%> (-0.01%)`	⬇️
nimare/meta/kernel.py	`80.12% <100.00%> (ø)`
nimare/meta/utils.py	`54.29% <100.00%> (+1.49%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7555c90...20504fc. Read the comment docs.

adelavega · 2022-05-31T23:08:04Z

Ugh, black made changes in tons of files. Shouldn't black already have been run on main?

adelavega · 2022-05-31T23:12:34Z

nimare/meta/cbma/mkda.py

-        # return ma_values.T.dot(self.weight_vec_).ravel()
-        weighted_ma_vals = ma_values * self.weight_vec_
-        return weighted_ma_vals.sum(0)
+        return ma_values.T.dot(self.weight_vec_).ravel()


This seems like a potential edge case that is not worth the slow down, so I went w/ the faster version.

Anyone w mac want to verify if this is still an issue?

I can test it out tomorrow.

tests.test_meta_mkda.test_MKDADensity_montecarlo_null (after changing n_cores to 2 and n_iters to 10) went from 6.52s to 5.68s after making the change on my MacBook Pro (OSX 12.3.1). Maybe the architecture Tal was referring to is the M1 chip? Another possibility is that the problem was unique to multiprocessing, and when I switched to joblib it resolved the issue without anyone noticing.

In case anyone wants to dig further into it, the commit where the workaround was added is 68c515a.

adelavega · 2022-05-31T23:13:20Z

nimare/meta/kernel.py

        imgs = []
        # Loop over exp ids since sparse._coo.core.COO is not iterable
        for i_exp, id_ in enumerate(transformed_maps[1]):
-            if isinstance(transformed_maps[0][i_exp], sparse._coo.core.COO):


@JulioAPeraza is this check necessary? it's slow. I don't see why it wouldn't be a spare matrix, right?

Apparently this check is necessary, because without it tests fails. However, it shouldn't be necessary because the output should be sparse here.

It's not in ALEKernel however.

This is inconsistent w/ the API because if you say "return_type='sparse'" you will get back a dense matrix w/ an ALEKernel

Moved to #692.

Sorry that I missed this comment. I created a new folder for GitHub email and I didn't get notified of this one. I added a comment to the new issue.

nimare/meta/utils.py

tsalo · 2022-05-31T23:24:08Z

I think you must have used a different version of black, because all files should have already had black run on them and the linter is currently failing.

adelavega · 2022-06-01T17:34:22Z

By the way, I think the other major speed up we could potentially get is related to sparse memory usage.

Currently, compute_kda_ma returns a sparse matrix, which correct_fwe_montecarlo then converts to image because it calls:

self.kernel_transformer.transform(
            iter_df, masker=self.masker, return_type="array"
        )

with return_type == 'array`.

This means we spend a lot of time going from sparse --> dense. We save a tiny bit of memory doing this because we loop over experiments and only keep the masked voxels, but its slow.

Starting from dense and staying that would should speed things up, but it's not clear to me by how much. It also seems like memory is a bigger problem than speed, so it may not be worth it.

@JulioAPeraza

adelavega · 2022-06-02T18:07:05Z

@tsalo I believe this is ready to go as soon as you approve

tsalo

I think I understand the changes to the code, and they look good to me.

I have some notes on the comments though- mostly just adding TODO to comments we might want to tackle in a future PR.

nimare/meta/kernel.py

nimare/meta/utils.py

tsalo · 2022-06-02T19:19:23Z

I totally forgot- can you also add unique_rows to docs/api.rst? Sorry about that!

adelavega · 2022-06-02T19:23:16Z

Yes, and actually I moved it to nimare.utils which makes more sense. We could presumably reuse it throughout the package, not just in cbma. Agree?

tsalo · 2022-06-02T19:35:21Z

Agreed! I'm sure there'll be a few places where it could come in handy.

tsalo · 2022-06-02T19:35:21Z

Agreed! I'm sure there'll be a few places where it could come in handy.

tsalo

The changes look good to me. I might write a test or two for the new function (or just copy them from skimage) in a future PR, but I don't want to block this one any longer. Once CI passes, feel free to merge.

I'm just going to change the title slightly, to match the pattern in other PRs.

EDIT: I also realized the changes were made to the kernel and the MKDA (rather than KDA) Estimator.

adelavega added 9 commits May 26, 2022 15:29

Resolve merge

94acfab

Fix typo

dcb65dc

Don't copy coordinates

9317036

Increment speed up

06b6a2c

Optimize uniqueness checking

c4ddd8a

Finalize compute_kda_ma optimization

0f67377

Add comments

9905358

Move sum overlap back

cdf7885

Replace 3d solution

0af0a99

adelavega added 5 commits May 27, 2022 19:56

Revert example

c57a129

Revert comment

b03088e

Revert typos

3204c3f

Revert comments

1442814

Add pandas implementation of unique rows

828feb9

Replace with scikit-image version

8378f8e

adelavega added 3 commits May 31, 2022 14:19

Run black

dcc016a

Group by w/ numpy

2ced069

Fix unique ordering bug

f6b9276

adelavega added 6 commits May 31, 2022 17:04

Optimize convolve

9b60431

Remove check if content is sparse-- it should be"

5084dcc

Replace _compute_summarystat_est with commented dot version"

8a754f4

Make counting unique clusters faster

0982479

Merge branch 'speed_2' into speed

4d15937

Revert examples

4f05d6b

adelavega force-pushed the speed branch from b1e55fd to 15a4d9b Compare May 31, 2022 23:10

adelavega added 2 commits May 31, 2022 18:11

Revert black on other directories

9c76c8c

Revert black on base python files

844ad73

adelavega commented May 31, 2022

View reviewed changes

nimare/meta/utils.py Outdated Show resolved Hide resolved

adelavega added 2 commits June 1, 2022 14:42

Add copyright

adc803e

Lint

eeb3f89

adelavega changed the title ~~WIP: Speed up KDA MA~~ Optimize KDA Meta-Analysis Jun 1, 2022

adelavega added 3 commits June 1, 2022 15:59

Revert sparse check

877cad9

Lint again

5817f2a

Add myself to zenodo

f05e6f6

tsalo added this to the 0.0.12 milestone Jun 2, 2022

tsalo requested changes Jun 2, 2022

View reviewed changes

adelavega added 2 commits June 2, 2022 14:01

Apply @tsalo review

d2d2e6b

Lint

4b049cc

tsalo added refactoring Requesting changes to the code which do not impact behavior cbma Issues/PRs pertaining to coordinate-based meta-analysis labels Jun 2, 2022

Add unique_rows to API and move to main nimare utils

20504fc

tsalo approved these changes Jun 2, 2022

View reviewed changes

tsalo changed the title ~~Optimize KDA Meta-Analysis~~ Optimize numpy operations in MKDADensity Estimator and (M)KDAKernel Jun 2, 2022

adelavega merged commit 2b9229f into main Jun 2, 2022

adelavega deleted the speed branch June 2, 2022 20:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize numpy operations in MKDADensity Estimator and (M)KDAKernel #685

Optimize numpy operations in MKDADensity Estimator and (M)KDAKernel #685

adelavega commented May 28, 2022 •

edited

Loading

welcome bot commented May 28, 2022

adelavega commented May 28, 2022 •

edited

Loading

adelavega commented May 28, 2022 •

edited

Loading

adelavega commented May 31, 2022 •

edited

Loading

codecov bot commented May 31, 2022 •

edited

Loading

adelavega commented May 31, 2022

adelavega May 31, 2022 •

edited

Loading

tsalo Jun 1, 2022

tsalo Jun 2, 2022 •

edited

Loading

adelavega May 31, 2022

adelavega Jun 1, 2022 •

edited

Loading

tsalo Jun 2, 2022

JulioAPeraza Jun 3, 2022

tsalo commented May 31, 2022

adelavega commented Jun 1, 2022

adelavega commented Jun 2, 2022

tsalo left a comment

tsalo commented Jun 2, 2022

adelavega commented Jun 2, 2022

tsalo commented Jun 2, 2022

tsalo commented Jun 2, 2022

tsalo left a comment •

edited

Loading

Optimize numpy operations in MKDADensity Estimator and (M)KDAKernel #685

Optimize numpy operations in MKDADensity Estimator and (M)KDAKernel #685

Conversation

adelavega commented May 28, 2022 • edited Loading

welcome bot commented May 28, 2022

adelavega commented May 28, 2022 • edited Loading

adelavega commented May 28, 2022 • edited Loading

adelavega commented May 31, 2022 • edited Loading

codecov bot commented May 31, 2022 • edited Loading

Codecov Report

adelavega commented May 31, 2022

adelavega May 31, 2022 • edited Loading

Choose a reason for hiding this comment

tsalo Jun 1, 2022

Choose a reason for hiding this comment

tsalo Jun 2, 2022 • edited Loading

Choose a reason for hiding this comment

adelavega May 31, 2022

Choose a reason for hiding this comment

adelavega Jun 1, 2022 • edited Loading

Choose a reason for hiding this comment

tsalo Jun 2, 2022

Choose a reason for hiding this comment

JulioAPeraza Jun 3, 2022

Choose a reason for hiding this comment

tsalo commented May 31, 2022

adelavega commented Jun 1, 2022

adelavega commented Jun 2, 2022

tsalo left a comment

Choose a reason for hiding this comment

tsalo commented Jun 2, 2022

adelavega commented Jun 2, 2022

tsalo commented Jun 2, 2022

tsalo commented Jun 2, 2022

tsalo left a comment • edited Loading

Choose a reason for hiding this comment

adelavega commented May 28, 2022 •

edited

Loading

adelavega commented May 28, 2022 •

edited

Loading

adelavega commented May 28, 2022 •

edited

Loading

adelavega commented May 31, 2022 •

edited

Loading

codecov bot commented May 31, 2022 •

edited

Loading

adelavega May 31, 2022 •

edited

Loading

tsalo Jun 2, 2022 •

edited

Loading

adelavega Jun 1, 2022 •

edited

Loading

tsalo left a comment •

edited

Loading