diff --git a/docs/contributors.md b/docs/contributors.md index a9b2d79e3c..2e9c62d255 100644 --- a/docs/contributors.md +++ b/docs/contributors.md @@ -1,22 +1,25 @@ # Contributors [anndata graph](https://github.com/scverse/anndata/graphs/contributors>) | [scanpy graph](https://github.com/scverse/scanpy/graphs/contributors)| ☀ = maintainer + ## Current developers -- [Isaac Virshup](https://github.com/ivirshup), lead developer since 2019 ☀ -- [Gökcen Eraslan](https://twitter.com/gokcen), developer, diverse contributions ☀ -- [Sergei Rybakov](https://github.com/Koncopd), developer, diverse contributions ☀ -- [Fidel Ramirez](https://github.com/fidelram) developer, plotting ☀ -- [Giovanni Palla](https://twitter.com/g_palla1), developer, spatial data -- [Malte Luecken](https://twitter.com/MDLuecken), developer, community & forum +- [Philipp Angerer](https://github.com/flying-sheep), lead developer since 2023, software quality, initial anndata conception ☀ +- [Ilan Gold](https://github.com/ilan-gold), developer, Dask ☀ +- [Severin Dicks](https://github.com/SeverinDicks), developer, performance ☀ - [Lukas Heumos](https://twitter.com/LukasHeumos), developer, diverse contributions -- [Philipp Angerer](https://github.com/flying-sheep), developer, software quality, initial anndata conception ☀ ## Other roles +- [Isaac Virshup](https://github.com/ivirshup), lead developer 2019-2023 - [Alex Wolf](https://twitter.com/falexwolf): lead developer 2016-2019, initial anndata & scanpy conception - [Fabian Theis](https://twitter.com/fabian_theis) & lab: enabling guidance, support and environment ## Former developers -- Tom White: developer 2018-2019, distributed computing +- [Tom White](https://github.com/tomwhite): developer 2018-2019, distributed computing +- [Gökcen Eraslan](https://twitter.com/gokcen), developer, diverse contributions +- [Sergei Rybakov](https://github.com/Koncopd), developer, diverse contributions +- [Fidel Ramirez](https://github.com/fidelram) developer, plotting +- [Giovanni Palla](https://twitter.com/g_palla1), developer, spatial data +- [Malte Luecken](https://twitter.com/MDLuecken), developer, community & forum diff --git a/docs/dev/code.md b/docs/dev/code.md index 1e9d295725..3ca393c8f7 100644 --- a/docs/dev/code.md +++ b/docs/dev/code.md @@ -9,6 +9,7 @@ 5. {ref}`Make sure all tests are passing ` 6. {ref}`Build and visually check any changed documentation ` 7. {ref}`Open a PR back to the main repository ` +8. {ref}`Add a release note to your PR ` ## Code style diff --git a/docs/dev/documentation.md b/docs/dev/documentation.md index d9c3f6e034..dcad9533ed 100644 --- a/docs/dev/documentation.md +++ b/docs/dev/documentation.md @@ -12,10 +12,12 @@ Sometimes these caches are not invalidated when you've updated the docs. If docs are not updating the way you expect, first try "force reloading" your browser page – e.g. reload the page without using the cache. Next, if problems persist, clear the sphinx cache (`hatch run docs:clean`) and try building them again. +(adding-to-the-docs)= + ## Adding to the docs For any user-visible changes, please make sure a note has been added to the release notes using [`hatch run towncrier:create`][towncrier create]. -We recommend waiting on this until your PR is close to done since this can often causes merge conflicts. +When asked for “Issue number (`+` if none)”, enter the *PR number* instead. Once you've added a new function to the documentation, you'll need to make sure there is a link somewhere in the documentation site pointing to it. This should be added to `docs/api.md` under a relevant heading. diff --git a/docs/release-notes/1.11.0.md b/docs/release-notes/1.11.0.md index c7258ea271..a41103e0ec 100644 --- a/docs/release-notes/1.11.0.md +++ b/docs/release-notes/1.11.0.md @@ -6,14 +6,14 @@ - {func}`~scanpy.pp.sample` supports both upsampling and downsampling of observations and variables. {func}`~scanpy.pp.subsample` is now deprecated. {smaller}`G Eraslan & P Angerer` ({pr}`943`) - Add `layer` argument to {func}`scanpy.tl.score_genes` and {func}`scanpy.tl.score_genes_cell_cycle` {smaller}`L Zappia` ({pr}`2921`) - Prevent `raw` conflict with `layer` in {func}`~scanpy.tl.score_genes` {smaller}`S Dicks` ({pr}`3155`) -- Add support for `median` as an aggregation function to the `Aggregation` class in `scanpy.get._aggregated.py`. This allows for median-based aggregation of data (e.g., pseudobulk), complementing existing methods like mean- and sum-based aggregation {smaller}`M Dehkordi (Farhad)` ({pr}`3180`) +- Add support for `median` as an aggregation function to {func}`~scanpy.get.aggregate`. This allows for median-based aggregation of data (e.g., pseudobulk), complementing existing methods like mean- and sum-based aggregation {smaller}`M Dehkordi (Farhad)` ({pr}`3180`) - Add `key_added` argument to {func}`~scanpy.pp.pca`, {func}`~scanpy.tl.tsne` and {func}`~scanpy.tl.umap` {smaller}`P Angerer` ({pr}`3184`) - Support running {func}`scanpy.pp.pca` on sparse Dask arrays with the `'covariance_eigh'` solver {smaller}`P Angerer` ({pr}`3263`) -- Use upstreamed {class}`~sklearn.decomposition.PCA` implementation for {class}`~scipy.sparse.csr_array` and {class}`~scipy.sparse.csr_matrix` (see {ref}`sklearn:changes_1_4`) {smaller}`P Angerer` ({pr}`3267`) +- Use upstreamed {class}`~sklearn.decomposition.PCA` implementation for {class}`~scipy.sparse.csr_array` and {class}`~scipy.sparse.csr_matrix` (see scikit-learn {ref}`sklearn:changes_1_4`) {smaller}`P Angerer` ({pr}`3267`) - Add explicit support to {func}`scanpy.pp.pca` for `svd_solver='covariance_eigh'` {smaller}`P Angerer` ({pr}`3296`) -- Add support {class}`dask.array.Array` to {func}`scanpy.pp.calculate_qc_metrics` {smaller}`I Gold` ({pr}`3307`) +- Add support for {class}`dask.array.Array` to {func}`scanpy.pp.calculate_qc_metrics` {smaller}`I Gold` ({pr}`3307`) - Support `layer` parameter in {func}`scanpy.pl.highest_expr_genes` {smaller}`P Angerer` ({pr}`3324`) -- Run numba functions single-threaded when called from inside of a ThreadPool {smaller}`P Angerer` ({pr}`3335`) +- Run numba functions single-threaded when called from inside of a {class}`~multiprocessing.pool.ThreadPool` {smaller}`P Angerer` ({pr}`3335`) - Switch {func}`~scanpy.logging.print_header` and {func}`~scanpy.logging.print_versions` to {mod}`session_info2` {smaller}`P Angerer` ({pr}`3384`) - Add sampling probabilities/mask parameter `p` to {func}`~scanpy.pp.sample` {smaller}`P Angerer` ({pr}`3410`) diff --git a/docs/release-notes/3426.bugfix.md b/docs/release-notes/3426.bugfix.md new file mode 100644 index 0000000000..4565f1ee35 --- /dev/null +++ b/docs/release-notes/3426.bugfix.md @@ -0,0 +1 @@ +Fix {func}`~scanpy.tl.rank_genes_groups` compatibility with data >10M cells {smaller}`P Angerer` diff --git a/pyproject.toml b/pyproject.toml index 00d2c9062c..770400ff00 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -22,9 +22,9 @@ authors = [ { name = "Andrés R. Muñoz-Rojas" }, ] maintainers = [ - { name = "Isaac Virshup", email = "ivirshup@gmail.com" }, { name = "Philipp Angerer", email = "phil.angerer@gmail.com" }, - { name = "Alex Wolf", email = "f.alex.wolf@gmx.de" }, + { name = "Ilan Gold" }, + { name = "Severin Dicks" }, ] readme = "README.md" classifiers = [ @@ -70,12 +70,14 @@ dependencies = [ ] dynamic = [ "version" ] +# https://docs.pypi.org/project_metadata/#project-urls [project.urls] Documentation = "https://scanpy.readthedocs.io/" Source = "https://github.com/scverse/scanpy" -Home-page = "https://scanpy.org" +Homepage = "https://scanpy.org" Discourse = "https://discourse.scverse.org/c/help/scanpy/37" -Twitter = "https://twitter.com/scverse_team" +Bluesky = "https://bsky.app/profile/scverse.bsky.social" +Twitter = "https://x.com/scverse_team" [project.scripts] scanpy = "scanpy.cli:console_main" diff --git a/src/scanpy/preprocessing/_simple.py b/src/scanpy/preprocessing/_simple.py index 821615676a..ac68edd376 100644 --- a/src/scanpy/preprocessing/_simple.py +++ b/src/scanpy/preprocessing/_simple.py @@ -885,7 +885,7 @@ def sample( Rows correspond to cells and columns to genes. fraction Sample to this `fraction` of the number of observations or variables. - (All of them, even if there are `0`s/`False`s in `p`.) + (All of them, even if there are `0`\\ s/`False`\\ s in `p`.) This can be larger than 1.0, if `replace=True`. See `axis` and `replace`. n diff --git a/src/scanpy/tools/_rank_genes_groups.py b/src/scanpy/tools/_rank_genes_groups.py index 2c214fcfdd..cafb78c6f1 100644 --- a/src/scanpy/tools/_rank_genes_groups.py +++ b/src/scanpy/tools/_rank_genes_groups.py @@ -2,7 +2,6 @@ from __future__ import annotations -from math import floor from typing import TYPE_CHECKING, Literal import numpy as np @@ -32,6 +31,8 @@ # Used with get_literal_vals _Method = Literal["logreg", "t-test", "wilcoxon", "t-test_overestim_var"] +_CONST_MAX_SIZE = 10000000 + def _select_top_n(scores: NDArray, n_top: int): n_from = scores.shape[0] @@ -47,9 +48,7 @@ def _ranks( X: np.ndarray | sparse.csr_matrix | sparse.csc_matrix, mask_obs: NDArray[np.bool_] | None = None, mask_obs_rest: NDArray[np.bool_] | None = None, -): - CONST_MAX_SIZE = 10000000 - +) -> Generator[tuple[pd.DataFrame, int, int], None, None]: n_genes = X.shape[1] if issparse(X): @@ -71,7 +70,7 @@ def _ranks( get_chunk = lambda X, left, right: adapt(X[:, left:right]) # Calculate chunk frames - max_chunk = floor(CONST_MAX_SIZE / n_cells) + max_chunk = max(_CONST_MAX_SIZE // n_cells, 1) for left in range(0, n_genes, max_chunk): right = min(left + max_chunk, n_genes) @@ -81,7 +80,7 @@ def _ranks( yield ranks, left, right -def _tiecorrect(ranks): +def _tiecorrect(ranks: pd.DataFrame) -> np.float64: size = np.float64(ranks.shape[0]) if size < 2: return np.repeat(ranks.shape[1], 1.0) diff --git a/tests/test_rank_genes_groups.py b/tests/test_rank_genes_groups.py index a36e6b14f1..788c7e705d 100644 --- a/tests/test_rank_genes_groups.py +++ b/tests/test_rank_genes_groups.py @@ -307,6 +307,13 @@ def test_wilcoxon_tie_correction(reference): np.testing.assert_allclose(test_obj.stats[groups[0]]["pvals"], pvals) +def test_wilcoxon_huge_data(monkeypatch): + max_size = 300 + adata = pbmc68k_reduced() + monkeypatch.setattr(sc.tl._rank_genes_groups, "_CONST_MAX_SIZE", max_size) + rank_genes_groups(adata, groupby="bulk_labels", method="wilcoxon") + + @pytest.mark.parametrize( ("n_genes_add", "n_genes_out_add"), [pytest.param(0, 0, id="equal"), pytest.param(2, 1, id="more")],