Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dask PCA support #2563

Merged
merged 46 commits into from
Aug 21, 2023
Merged

Dask PCA support #2563

merged 46 commits into from
Aug 21, 2023

Conversation

selmanozleyen
Copy link
Member

Hi,

Just wanted to start the PR. Passes the tests except one. Also need to deal with solver names since they don't correspond to anything dask uses.

Also refactored where the DaskArray mock class is.

Pinging @ivirshup

@codecov
Copy link

codecov bot commented Jul 19, 2023

Codecov Report

Merging #2563 (3d86e82) into master (b3d5a6c) will increase coverage by 0.04%.
The diff coverage is 89.13%.

❗ Current head 3d86e82 differs from pull request most recent head 8541ce3. Consider uploading reports for the commit 8541ce3 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2563      +/-   ##
==========================================
+ Coverage   72.27%   72.32%   +0.04%     
==========================================
  Files         105      105              
  Lines       11753    11785      +32     
==========================================
+ Hits         8495     8524      +29     
- Misses       3258     3261       +3     
Files Changed Coverage Δ
scanpy/preprocessing/_normalization.py 88.88% <ø> (ø)
scanpy/testing/_pytest/marks.py 84.61% <ø> (ø)
scanpy/preprocessing/_pca.py 94.30% <89.13%> (-0.94%) ⬇️

Copy link
Member

@flying-sheep flying-sheep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks straightforward enough!

  1. please make sure the docs look right and you are using links. You probably have to add a line to intersphinx_mapping in docs/conf.py to be able to link to the dask-ml docs.
  2. the logic seems straightforward, but the user should know when which svd_solver argument works, as that part is getting complicated.
  3. you have a lot of duplicated lines that can just be moved to after their if/else block to deduplicate them

pyproject.toml Outdated Show resolved Hide resolved
scanpy/preprocessing/_pca.py Outdated Show resolved Hide resolved
scanpy/preprocessing/_pca.py Outdated Show resolved Hide resolved
scanpy/preprocessing/_pca.py Outdated Show resolved Hide resolved
scanpy/preprocessing/_pca.py Outdated Show resolved Hide resolved
scanpy/preprocessing/_pca.py Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
selmanozleyen and others added 2 commits July 26, 2023 15:25
Copy link
Member

@flying-sheep flying-sheep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not zero_center or random_state or svd_solver != 'arpack':
logg.debug('Ignoring zero_center, random_state, svd_solver')

at this point, svd_solver is None, right? Please make this line work with both None and the explicit value

scanpy/preprocessing/_pca.py Outdated Show resolved Hide resolved
@flying-sheep flying-sheep mentioned this pull request Aug 4, 2023
19 tasks
@selmanozleyen
Copy link
Member Author

@flying-sheep , haven't considered all combinations yet but wanted to check if this way is good for testing warnings.

@flying-sheep
Copy link
Member

Looks great! You can get rid of expect_warning, as you can just simply check if expected_warning_message is not None instead, otherwise pretty ideal!

@flying-sheep
Copy link
Member

You also need to fix the links. dask.arrays.linalg.svd_compressed should be dask.arrays.linalg.svd_compressed, then the docs will build.

Copy link
Member

@flying-sheep flying-sheep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great! once the docs build (and you’re happy with it), that’s ready

@flying-sheep flying-sheep added this to the 1.10.0 milestone Aug 21, 2023
@flying-sheep flying-sheep enabled auto-merge (squash) August 21, 2023 11:34
@flying-sheep flying-sheep disabled auto-merge August 21, 2023 11:37
@flying-sheep flying-sheep enabled auto-merge (squash) August 21, 2023 13:36
@flying-sheep flying-sheep merged commit 2b56e00 into scverse:master Aug 21, 2023
@selmanozleyen selmanozleyen deleted the dask-pca branch August 23, 2023 04:15
flying-sheep added a commit that referenced this pull request Sep 7, 2023
Co-authored-by: Philipp A. <[email protected]>
Co-authored-by: Philipp A <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants