Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement sparse covariance_eigh PCA using Dask #3263

Merged
merged 43 commits into from
Oct 18, 2024
Merged

Conversation

flying-sheep
Copy link
Member

@flying-sheep flying-sheep commented Sep 26, 2024

  • Release notes not necessary because:

An alternative would be to subclass PCA, but that would involve erroring out or reimplementing all of its options.

Ideally #3267 would be merged first and this one integrated into its improved decision tree.

Copy link

codecov bot commented Sep 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.18%. Comparing base (121f2db) to head (97c3812).
Report is 56 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3263      +/-   ##
==========================================
+ Coverage   77.01%   77.18%   +0.16%     
==========================================
  Files         110      111       +1     
  Lines       12492    12582      +90     
==========================================
+ Hits         9621     9711      +90     
  Misses       2871     2871              
Files with missing lines Coverage Δ
src/scanpy/preprocessing/_pca/__init__.py 90.81% <100.00%> (+0.70%) ⬆️
src/scanpy/preprocessing/_pca/_dask_sparse.py 100.00% <100.00%> (ø)

@flying-sheep flying-sheep marked this pull request as ready for review September 30, 2024 11:53
@flying-sheep flying-sheep added this to the 1.11.0 milestone Sep 30, 2024
@flying-sheep flying-sheep mentioned this pull request Sep 30, 2024
Comment on lines +241 to +243
if isinstance(adata.X, DaskArray) and issparse(adata.X._meta):
reason = "TruncatedSVD is not supported for sparse Dask yet"
request.applymarker(pytest.mark.xfail(reason=reason))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Intron7 you said it would be possible to adapt the code to do TruncatedSVD as well?

I think this PR is pretty clean, so maybe we can extend it in another PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, PR into this PR if we start pre-merge

src/scanpy/preprocessing/_pca/__init__.py Outdated Show resolved Hide resolved
tests/test_pca.py Outdated Show resolved Hide resolved
@flying-sheep flying-sheep changed the title Implement sparse PCA using dask Implement sparse covariance_eigh PCA using Dask Oct 18, 2024
ilan-gold

This comment was marked as off-topic.

@ilan-gold
Copy link
Contributor

ilan-gold commented Oct 18, 2024

Can we add a test here to find out (or to at least know) how far our implementation here is from #3296 for sparse?

@flying-sheep
Copy link
Member Author

Done!

Copy link
Contributor

@ilan-gold ilan-gold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If everything passes CI, good to go from me

tests/test_pca.py Outdated Show resolved Hide resolved
tests/test_pca.py Outdated Show resolved Hide resolved
Copy link
Member

@Intron7 Intron7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@flying-sheep flying-sheep enabled auto-merge (squash) October 18, 2024 16:19
@flying-sheep flying-sheep disabled auto-merge October 18, 2024 17:04
@flying-sheep flying-sheep merged commit bae1610 into main Oct 18, 2024
12 of 14 checks passed
@flying-sheep flying-sheep deleted the pca-dask-sparse branch October 18, 2024 17:04
@flying-sheep flying-sheep mentioned this pull request Nov 19, 2024
19 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dask Sparse PCA
3 participants