Merge branch 'main' into pa/fmt

scverse · Jan 9, 2025 · 83515fc · 83515fc
2 parents 1d636bb + e71dc55
commit 83515fc
Show file tree

Hide file tree

Showing 9 changed files with 39 additions and 24 deletions.
diff --git a/docs/contributors.md b/docs/contributors.md
@@ -1,22 +1,25 @@
 # Contributors
 
 [anndata graph](https://github.com/scverse/anndata/graphs/contributors>) | [scanpy graph](https://github.com/scverse/scanpy/graphs/contributors)| ☀ = maintainer
+
 ## Current developers
 
-- [Isaac Virshup](https://github.com/ivirshup), lead developer since 2019 ☀
-- [Gökcen Eraslan](https://twitter.com/gokcen), developer, diverse contributions ☀
-- [Sergei Rybakov](https://github.com/Koncopd), developer, diverse contributions ☀
-- [Fidel Ramirez](https://github.com/fidelram) developer, plotting ☀
-- [Giovanni Palla](https://twitter.com/g_palla1), developer, spatial data
-- [Malte Luecken](https://twitter.com/MDLuecken), developer, community & forum
+- [Philipp Angerer](https://github.com/flying-sheep), lead developer since 2023, software quality, initial anndata conception ☀
+- [Ilan Gold](https://github.com/ilan-gold), developer, Dask ☀
+- [Severin Dicks](https://github.com/SeverinDicks), developer, performance ☀
 - [Lukas Heumos](https://twitter.com/LukasHeumos), developer, diverse contributions
-- [Philipp Angerer](https://github.com/flying-sheep), developer, software quality, initial anndata conception ☀
 
 ## Other roles
 
+- [Isaac Virshup](https://github.com/ivirshup), lead developer 2019-2023
 - [Alex Wolf](https://twitter.com/falexwolf): lead developer 2016-2019, initial anndata & scanpy conception
 - [Fabian Theis](https://twitter.com/fabian_theis) & lab: enabling guidance, support and environment
 
 ## Former developers
 
-- Tom White: developer 2018-2019, distributed computing
+- [Tom White](https://github.com/tomwhite): developer 2018-2019, distributed computing
+- [Gökcen Eraslan](https://twitter.com/gokcen), developer, diverse contributions
+- [Sergei Rybakov](https://github.com/Koncopd), developer, diverse contributions
+- [Fidel Ramirez](https://github.com/fidelram) developer, plotting
+- [Giovanni Palla](https://twitter.com/g_palla1), developer, spatial data
+- [Malte Luecken](https://twitter.com/MDLuecken), developer, community & forum
diff --git a/docs/dev/code.md b/docs/dev/code.md
@@ -9,6 +9,7 @@
 5. {ref}`Make sure all tests are passing <tests>`
 6. {ref}`Build and visually check any changed documentation <building-the-docs>`
 7. {ref}`Open a PR back to the main repository <open-a-pr>`
+8. {ref}`Add a release note to your PR <adding-to-the-docs>`
 
 ## Code style
 

diff --git a/docs/dev/documentation.md b/docs/dev/documentation.md
@@ -12,10 +12,12 @@ Sometimes these caches are not invalidated when you've updated the docs.
 If docs are not updating the way you expect, first try "force reloading" your browser page – e.g. reload the page without using the cache.
 Next, if problems persist, clear the sphinx cache (`hatch run docs:clean`) and try building them again.
 
+(adding-to-the-docs)=
+
 ## Adding to the docs
 
 For any user-visible changes, please make sure a note has been added to the release notes using [`hatch run towncrier:create`][towncrier create].
-We recommend waiting on this until your PR is close to done since this can often causes merge conflicts.
+When asked for “Issue number (`+` if none)”, enter the *PR number* instead.
 
 Once you've added a new function to the documentation, you'll need to make sure there is a link somewhere in the documentation site pointing to it.
 This should be added to `docs/api.md` under a relevant heading.

diff --git a/docs/release-notes/1.11.0.md b/docs/release-notes/1.11.0.md
@@ -6,14 +6,14 @@
 - {func}`~scanpy.pp.sample` supports both upsampling and downsampling of observations and variables. {func}`~scanpy.pp.subsample` is now deprecated. {smaller}`G Eraslan & P Angerer` ({pr}`943`)
 - Add `layer` argument to {func}`scanpy.tl.score_genes` and {func}`scanpy.tl.score_genes_cell_cycle` {smaller}`L Zappia` ({pr}`2921`)
 - Prevent `raw` conflict with `layer` in {func}`~scanpy.tl.score_genes` {smaller}`S Dicks` ({pr}`3155`)
-- Add support for `median` as an aggregation function to the `Aggregation` class in `scanpy.get._aggregated.py`. This allows for median-based aggregation of data (e.g., pseudobulk), complementing existing methods like mean- and sum-based aggregation {smaller}`M Dehkordi (Farhad)` ({pr}`3180`)
+- Add support for `median` as an aggregation function to {func}`~scanpy.get.aggregate`. This allows for median-based aggregation of data (e.g., pseudobulk), complementing existing methods like mean- and sum-based aggregation {smaller}`M Dehkordi (Farhad)` ({pr}`3180`)
 - Add `key_added` argument to {func}`~scanpy.pp.pca`, {func}`~scanpy.tl.tsne` and {func}`~scanpy.tl.umap` {smaller}`P Angerer` ({pr}`3184`)
 - Support running {func}`scanpy.pp.pca` on sparse Dask arrays with the `'covariance_eigh'` solver {smaller}`P Angerer` ({pr}`3263`)
-- Use upstreamed {class}`~sklearn.decomposition.PCA` implementation for {class}`~scipy.sparse.csr_array` and {class}`~scipy.sparse.csr_matrix` (see {ref}`sklearn:changes_1_4`) {smaller}`P Angerer` ({pr}`3267`)
+- Use upstreamed {class}`~sklearn.decomposition.PCA` implementation for {class}`~scipy.sparse.csr_array` and {class}`~scipy.sparse.csr_matrix` (see scikit-learn {ref}`sklearn:changes_1_4`) {smaller}`P Angerer` ({pr}`3267`)
 - Add explicit support to {func}`scanpy.pp.pca` for `svd_solver='covariance_eigh'` {smaller}`P Angerer` ({pr}`3296`)
-- Add support {class}`dask.array.Array` to {func}`scanpy.pp.calculate_qc_metrics` {smaller}`I Gold` ({pr}`3307`)
+- Add support for {class}`dask.array.Array` to {func}`scanpy.pp.calculate_qc_metrics` {smaller}`I Gold` ({pr}`3307`)
 - Support `layer` parameter in {func}`scanpy.pl.highest_expr_genes` {smaller}`P Angerer` ({pr}`3324`)
-- Run numba functions single-threaded when called from inside of a ThreadPool {smaller}`P Angerer` ({pr}`3335`)
+- Run numba functions single-threaded when called from inside of a {class}`~multiprocessing.pool.ThreadPool` {smaller}`P Angerer` ({pr}`3335`)
 - Switch {func}`~scanpy.logging.print_header` and {func}`~scanpy.logging.print_versions` to {mod}`session_info2` {smaller}`P Angerer` ({pr}`3384`)
 - Add sampling probabilities/mask parameter `p` to {func}`~scanpy.pp.sample` {smaller}`P Angerer` ({pr}`3410`)
 

diff --git a/docs/release-notes/3426.bugfix.md b/docs/release-notes/3426.bugfix.md
@@ -0,0 +1 @@
+Fix {func}`~scanpy.tl.rank_genes_groups` compatibility with data >10M cells {smaller}`P Angerer`
diff --git a/pyproject.toml b/pyproject.toml
@@ -22,9 +22,9 @@ authors = [
     { name = "Andrés R. Muñoz-Rojas" },
 ]
 maintainers = [
-    { name = "Isaac Virshup", email = "[email protected]" },
     { name = "Philipp Angerer", email = "[email protected]" },
-    { name = "Alex Wolf", email = "[email protected]" },
+    { name = "Ilan Gold" },
+    { name = "Severin Dicks" },
 ]
 readme = "README.md"
 classifiers = [
@@ -70,12 +70,14 @@ dependencies = [
 ]
 dynamic = [ "version" ]
 
+# https://docs.pypi.org/project_metadata/#project-urls
 [project.urls]
 Documentation = "https://scanpy.readthedocs.io/"
 Source = "https://github.com/scverse/scanpy"
-Home-page = "https://scanpy.org"
+Homepage = "https://scanpy.org"
 Discourse = "https://discourse.scverse.org/c/help/scanpy/37"
-Twitter = "https://twitter.com/scverse_team"
+Bluesky = "https://bsky.app/profile/scverse.bsky.social"
+Twitter = "https://x.com/scverse_team"
 
 [project.scripts]
 scanpy = "scanpy.cli:console_main"

diff --git a/src/scanpy/preprocessing/_simple.py b/src/scanpy/preprocessing/_simple.py
@@ -885,7 +885,7 @@ def sample(
         Rows correspond to cells and columns to genes.
     fraction
         Sample to this `fraction` of the number of observations or variables.
-        (All of them, even if there are `0`s/`False`s in `p`.)
+        (All of them, even if there are `0`\\ s/`False`\\ s in `p`.)
         This can be larger than 1.0, if `replace=True`.
         See `axis` and `replace`.
     n

diff --git a/src/scanpy/tools/_rank_genes_groups.py b/src/scanpy/tools/_rank_genes_groups.py
@@ -2,7 +2,6 @@
 
 from __future__ import annotations
 
-from math import floor
 from typing import TYPE_CHECKING, Literal
 
 import numpy as np
@@ -32,6 +31,8 @@
 # Used with get_literal_vals
 _Method = Literal["logreg", "t-test", "wilcoxon", "t-test_overestim_var"]
 
+_CONST_MAX_SIZE = 10000000
+
 
 def _select_top_n(scores: NDArray, n_top: int):
     n_from = scores.shape[0]
@@ -47,9 +48,7 @@ def _ranks(
     X: np.ndarray | sparse.csr_matrix | sparse.csc_matrix,
     mask_obs: NDArray[np.bool_] | None = None,
     mask_obs_rest: NDArray[np.bool_] | None = None,
-):
-    CONST_MAX_SIZE = 10000000
-
+) -> Generator[tuple[pd.DataFrame, int, int], None, None]:
     n_genes = X.shape[1]
 
     if issparse(X):
@@ -71,7 +70,7 @@ def _ranks(
         get_chunk = lambda X, left, right: adapt(X[:, left:right])
 
     # Calculate chunk frames
-    max_chunk = floor(CONST_MAX_SIZE / n_cells)
+    max_chunk = max(_CONST_MAX_SIZE // n_cells, 1)
 
     for left in range(0, n_genes, max_chunk):
         right = min(left + max_chunk, n_genes)
@@ -81,7 +80,7 @@ def _ranks(
         yield ranks, left, right
 
 
-def _tiecorrect(ranks):
+def _tiecorrect(ranks: pd.DataFrame) -> np.float64:
     size = np.float64(ranks.shape[0])
     if size < 2:
         return np.repeat(ranks.shape[1], 1.0)

diff --git a/tests/test_rank_genes_groups.py b/tests/test_rank_genes_groups.py
@@ -307,6 +307,13 @@ def test_wilcoxon_tie_correction(reference):
     np.testing.assert_allclose(test_obj.stats[groups[0]]["pvals"], pvals)
 
 
+def test_wilcoxon_huge_data(monkeypatch):
+    max_size = 300
+    adata = pbmc68k_reduced()
+    monkeypatch.setattr(sc.tl._rank_genes_groups, "_CONST_MAX_SIZE", max_size)
+    rank_genes_groups(adata, groupby="bulk_labels", method="wilcoxon")
+
+
 @pytest.mark.parametrize(
     ("n_genes_add", "n_genes_out_add"),
     [pytest.param(0, 0, id="equal"), pytest.param(2, 1, id="more")],
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		Fix {func}`~scanpy.tl.rank_genes_groups` compatibility with data >10M cells {smaller}`P Angerer`