Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Notebook testing performance #383

Closed
wants to merge 5 commits into from

Conversation

jdkent
Copy link
Member

@jdkent jdkent commented Nov 7, 2020

Closes #None

Changes proposed in this pull request:

  • tests the false positive rate for ale, kda, and mkda
  • correlate p-values from ale, kda, and mkda to see which methods are more correlated with each other

@codecov
Copy link

codecov bot commented Nov 7, 2020

Codecov Report

Merging #383 (5751b49) into master (8dc6c55) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #383   +/-   ##
=======================================
  Coverage   82.32%   82.32%           
=======================================
  Files          40       40           
  Lines        3848     3848           
=======================================
  Hits         3168     3168           
  Misses        680      680           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8dc6c55...5751b49. Read the comment docs.

@tsalo
Copy link
Member

tsalo commented Nov 8, 2020

I think the current differences are probably due to the null method used. The default for ALE and MKDA should be analytical, while the default for KDA is still the empirical null.

Regarding the example itself, can you make the notebook render in the example gallery?

@jdkent
Copy link
Member Author

jdkent commented Nov 13, 2020

I messed up some commits, I'll get it fixed up:
some updates on the notebook

observations:

  • empirical and analytic null methods correlate strongly with each other (~0.99)
  • the ale analytic null method results in slightly more false positives than expected.
  • both mkda null methods result in a conservative false positive rate
  • the ale analytic null method is not evenly distributed in p-values from [0-1] (there appears to be bias towards certain p-values).

if these two lines

step_size = 0.0001
hist_bins = np.round(np.arange(0, max_poss_ale + 0.001, step_size), 4)

are changed to:

step_size = 0.00001
hist_bins = np.round(np.arange(0, max_poss_ale + 0.001, step_size), 5)

the bias disappears in my simulations (suggesting a certain granularity of the null distribution is necessary)

@tsalo
Copy link
Member

tsalo commented Nov 13, 2020

It seems like the MKDA analytic and empirical results are exactly the same... which is rather impressive if there are no bugs.

@jdkent
Copy link
Member Author

jdkent commented Nov 13, 2020

they appear really close, ALE does better when:

step_size = 0.00001
hist_bins = np.round(np.arange(0, max_poss_ale + 0.001, step_size), 5)

the ale analytic & empirical correlation is: 0.9999446144830378

Here's the false positive rates:
image

the p-value distribution:
image

p-value correlation:
image

So increasing the granularity of the hist_bins by 10x for ALE appears to bring the analytic and empirical methods really close to each other (but this may bite into the performance gains that Tal has been working on)

@tsalo
Copy link
Member

tsalo commented Nov 13, 2020

We can profile the difference, but I don't think it'll be that big a deal. I think your newest results are a pretty compelling argument for 100000 bins.

@tsalo
Copy link
Member

tsalo commented Nov 14, 2020

Regarding the example gallery, I think that we will want to use nbexamples.

Regarding merging, I think we should get #393 merged first, then an adapted version of #378, and then re-run your notebook with all three estimators and the new structure.

@jdkent jdkent force-pushed the enh/comprehensive_test branch from 4f4800c to 5751b49 Compare November 23, 2020 19:54
@tsalo
Copy link
Member

tsalo commented Nov 26, 2020

Can you also measure and plot elapsed time in this notebook?

@tsalo
Copy link
Member

tsalo commented Nov 30, 2020

I think the remaining elements that need to be added before we merge this are:

  • Test KDA analytic/empirical nulls
  • Track and plot elapsed time for each of the methods Dropped for now.
  • Render the notebook in our examples gallery using nbexamples Dropped for now.

@jdkent is there anything else you think this PR needs?

@tsalo
Copy link
Member

tsalo commented Dec 8, 2020

I spent some time trying to figure out how to use nbexamples and if there was a way to just render pre-run notebooks with sphinx-gallery, but I struck out on both counts. I don't think it's worth holding this up for that, so I think we should move forward with the KDA kernel and updating w.r.t. master.

@tsalo tsalo added this to the 0.0.5 milestone Dec 28, 2020
@tsalo
Copy link
Member

tsalo commented Mar 27, 2021

I wonder if we should place full-on analyses, like this notebook, in a separate repository. It's useful for NiMARE users, so we should definitely link to it in the documentation, but it's not exactly an "example" and it's also very computationally intensive to run.

@jdkent
Copy link
Member Author

jdkent commented Mar 29, 2021

I was recently thinking the same thing, I'll create a separate repo

@tsalo tsalo closed this Apr 23, 2021
@tsalo
Copy link
Member

tsalo commented Apr 23, 2021

Let me know when you're ready to work on the new repo. I'm happy to contribute if you want any help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants