Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize locating coordinates in convert_neurosynth_to_dataset #682

Merged
merged 2 commits into from
May 25, 2022

Conversation

ryanhammonds
Copy link
Contributor

Closes #666.

Changes proposed in this pull request:

  • small optimization that reduces convert_neurosynth_to_dataset runtime from ~7min to ~3min.
  • I profiled these changes with:
from nimare.extract import fetch_neurosynth
from nimare.io import convert_neurosynth_to_dataset

neurosynth_db = fetch_neurosynth(
    data_dir='.',
    version="7",
    source="abstract",
    vocab="terms",
)[0]

neurosynth_dset = convert_neurosynth_to_dataset(
    coordinates_file=neurosynth_db["coordinates"],
    metadata_file=neurosynth_db["metadata"],
    annotations_files=neurosynth_db["features"],
    feature_groups=['alzheimer']
)

@welcome
Copy link

welcome bot commented May 25, 2022

Thanks for opening this pull request! We have detected this is the first time you have contributed to NiMARE. Please check out our contributing guidelines.
We invite you to list yourself as a NiMARE contributor, so if your name is not already mentioned, please modify the .zenodo.json file with your data right above Angie's entry. Example:

{
  "name": "Contributor, New",
  "affiliation": "Department of Psychology, Some University",
  "orcid": "<your id>"
},
{
  "name": "Laird, Angela R.",
  "affiliation": "Florida International University",
  "orcid": "0000-0003-3379-8744"
},

Of course, if you want to opt out this time there is no problem at all with adding your name later. You will be always welcome to add it in the future whenever you feel it should be listed.

@codecov
Copy link

codecov bot commented May 25, 2022

Codecov Report

Merging #682 (a767254) into main (ceaebc3) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main     #682   +/-   ##
=======================================
  Coverage   85.24%   85.25%           
=======================================
  Files          41       41           
  Lines        4479     4482    +3     
=======================================
+ Hits         3818     3821    +3     
  Misses        661      661           
Impacted Files Coverage Δ
nimare/io.py 95.14% <100.00%> (+0.07%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ceaebc3...a767254. Read the comment docs.

@tsalo tsalo added the refactoring Requesting changes to the code which do not impact behavior label May 25, 2022
@tsalo
Copy link
Member

tsalo commented May 25, 2022

@ryanhammonds Thank you for the contribution! It looks like the linter is complaining, though. Can you run black on your code?

Copy link
Member

@tsalo tsalo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good to me, and I was able to reproduce the profiling results locally. @adelavega, since you opened the original issue, I wanted to see if this is what you were thinking of before merging.

@tsalo
Copy link
Member

tsalo commented May 25, 2022

Also, @ryanhammonds, do you want to add yourself to the Zenodo file?

@ryanhammonds
Copy link
Contributor Author

Thanks @tsalo! No need to add me to the Zenodo file. Maybe if I make larger contributions in the future.

@adelavega adelavega self-requested a review May 25, 2022 22:37
Copy link
Member

@adelavega adelavega left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. LGTM. I'm a bit surprised how much of a difference using numpy instead of pandas makes. Good lesson for other parts of the package.

@tsalo tsalo changed the title [ENH] optimize locating coordinates Optimize locating coordinates in convert_neurosynth_to_dataset May 25, 2022
@tsalo
Copy link
Member

tsalo commented May 25, 2022

Great, thanks @adelavega!

Merging now. Thanks again for the contribution @ryanhammonds.

@tsalo tsalo merged commit 1831250 into neurostuff:main May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring Requesting changes to the code which do not impact behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Speed up convert_neurosynth_to_dataset
3 participants