Skip to content

Commit

Permalink
Merge branch 'main' into pa/fmt
Browse files Browse the repository at this point in the history
  • Loading branch information
flying-sheep committed Jan 9, 2025
2 parents 1d636bb + e71dc55 commit 83515fc
Show file tree
Hide file tree
Showing 9 changed files with 39 additions and 24 deletions.
19 changes: 11 additions & 8 deletions docs/contributors.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,25 @@
# Contributors

[anndata graph](https://github.com/scverse/anndata/graphs/contributors>) | [scanpy graph](https://github.com/scverse/scanpy/graphs/contributors)| ☀ = maintainer

## Current developers

- [Isaac Virshup](https://github.com/ivirshup), lead developer since 2019 ☀
- [Gökcen Eraslan](https://twitter.com/gokcen), developer, diverse contributions ☀
- [Sergei Rybakov](https://github.com/Koncopd), developer, diverse contributions ☀
- [Fidel Ramirez](https://github.com/fidelram) developer, plotting ☀
- [Giovanni Palla](https://twitter.com/g_palla1), developer, spatial data
- [Malte Luecken](https://twitter.com/MDLuecken), developer, community & forum
- [Philipp Angerer](https://github.com/flying-sheep), lead developer since 2023, software quality, initial anndata conception ☀
- [Ilan Gold](https://github.com/ilan-gold), developer, Dask ☀
- [Severin Dicks](https://github.com/SeverinDicks), developer, performance ☀
- [Lukas Heumos](https://twitter.com/LukasHeumos), developer, diverse contributions
- [Philipp Angerer](https://github.com/flying-sheep), developer, software quality, initial anndata conception ☀

## Other roles

- [Isaac Virshup](https://github.com/ivirshup), lead developer 2019-2023
- [Alex Wolf](https://twitter.com/falexwolf): lead developer 2016-2019, initial anndata & scanpy conception
- [Fabian Theis](https://twitter.com/fabian_theis) & lab: enabling guidance, support and environment

## Former developers

- Tom White: developer 2018-2019, distributed computing
- [Tom White](https://github.com/tomwhite): developer 2018-2019, distributed computing
- [Gökcen Eraslan](https://twitter.com/gokcen), developer, diverse contributions
- [Sergei Rybakov](https://github.com/Koncopd), developer, diverse contributions
- [Fidel Ramirez](https://github.com/fidelram) developer, plotting
- [Giovanni Palla](https://twitter.com/g_palla1), developer, spatial data
- [Malte Luecken](https://twitter.com/MDLuecken), developer, community & forum
1 change: 1 addition & 0 deletions docs/dev/code.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
5. {ref}`Make sure all tests are passing <tests>`
6. {ref}`Build and visually check any changed documentation <building-the-docs>`
7. {ref}`Open a PR back to the main repository <open-a-pr>`
8. {ref}`Add a release note to your PR <adding-to-the-docs>`

## Code style

Expand Down
4 changes: 3 additions & 1 deletion docs/dev/documentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,12 @@ Sometimes these caches are not invalidated when you've updated the docs.
If docs are not updating the way you expect, first try "force reloading" your browser page – e.g. reload the page without using the cache.
Next, if problems persist, clear the sphinx cache (`hatch run docs:clean`) and try building them again.

(adding-to-the-docs)=

## Adding to the docs

For any user-visible changes, please make sure a note has been added to the release notes using [`hatch run towncrier:create`][towncrier create].
We recommend waiting on this until your PR is close to done since this can often causes merge conflicts.
When asked for “Issue number (`+` if none)”, enter the *PR number* instead.

Once you've added a new function to the documentation, you'll need to make sure there is a link somewhere in the documentation site pointing to it.
This should be added to `docs/api.md` under a relevant heading.
Expand Down
8 changes: 4 additions & 4 deletions docs/release-notes/1.11.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,14 @@
- {func}`~scanpy.pp.sample` supports both upsampling and downsampling of observations and variables. {func}`~scanpy.pp.subsample` is now deprecated. {smaller}`G Eraslan & P Angerer` ({pr}`943`)
- Add `layer` argument to {func}`scanpy.tl.score_genes` and {func}`scanpy.tl.score_genes_cell_cycle` {smaller}`L Zappia` ({pr}`2921`)
- Prevent `raw` conflict with `layer` in {func}`~scanpy.tl.score_genes` {smaller}`S Dicks` ({pr}`3155`)
- Add support for `median` as an aggregation function to the `Aggregation` class in `scanpy.get._aggregated.py`. This allows for median-based aggregation of data (e.g., pseudobulk), complementing existing methods like mean- and sum-based aggregation {smaller}`M Dehkordi (Farhad)` ({pr}`3180`)
- Add support for `median` as an aggregation function to {func}`~scanpy.get.aggregate`. This allows for median-based aggregation of data (e.g., pseudobulk), complementing existing methods like mean- and sum-based aggregation {smaller}`M Dehkordi (Farhad)` ({pr}`3180`)
- Add `key_added` argument to {func}`~scanpy.pp.pca`, {func}`~scanpy.tl.tsne` and {func}`~scanpy.tl.umap` {smaller}`P Angerer` ({pr}`3184`)
- Support running {func}`scanpy.pp.pca` on sparse Dask arrays with the `'covariance_eigh'` solver {smaller}`P Angerer` ({pr}`3263`)
- Use upstreamed {class}`~sklearn.decomposition.PCA` implementation for {class}`~scipy.sparse.csr_array` and {class}`~scipy.sparse.csr_matrix` (see {ref}`sklearn:changes_1_4`) {smaller}`P Angerer` ({pr}`3267`)
- Use upstreamed {class}`~sklearn.decomposition.PCA` implementation for {class}`~scipy.sparse.csr_array` and {class}`~scipy.sparse.csr_matrix` (see scikit-learn {ref}`sklearn:changes_1_4`) {smaller}`P Angerer` ({pr}`3267`)
- Add explicit support to {func}`scanpy.pp.pca` for `svd_solver='covariance_eigh'` {smaller}`P Angerer` ({pr}`3296`)
- Add support {class}`dask.array.Array` to {func}`scanpy.pp.calculate_qc_metrics` {smaller}`I Gold` ({pr}`3307`)
- Add support for {class}`dask.array.Array` to {func}`scanpy.pp.calculate_qc_metrics` {smaller}`I Gold` ({pr}`3307`)
- Support `layer` parameter in {func}`scanpy.pl.highest_expr_genes` {smaller}`P Angerer` ({pr}`3324`)
- Run numba functions single-threaded when called from inside of a ThreadPool {smaller}`P Angerer` ({pr}`3335`)
- Run numba functions single-threaded when called from inside of a {class}`~multiprocessing.pool.ThreadPool` {smaller}`P Angerer` ({pr}`3335`)
- Switch {func}`~scanpy.logging.print_header` and {func}`~scanpy.logging.print_versions` to {mod}`session_info2` {smaller}`P Angerer` ({pr}`3384`)
- Add sampling probabilities/mask parameter `p` to {func}`~scanpy.pp.sample` {smaller}`P Angerer` ({pr}`3410`)

Expand Down
1 change: 1 addition & 0 deletions docs/release-notes/3426.bugfix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fix {func}`~scanpy.tl.rank_genes_groups` compatibility with data >10M cells {smaller}`P Angerer`
10 changes: 6 additions & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@ authors = [
{ name = "Andrés R. Muñoz-Rojas" },
]
maintainers = [
{ name = "Isaac Virshup", email = "[email protected]" },
{ name = "Philipp Angerer", email = "[email protected]" },
{ name = "Alex Wolf", email = "[email protected]" },
{ name = "Ilan Gold" },
{ name = "Severin Dicks" },
]
readme = "README.md"
classifiers = [
Expand Down Expand Up @@ -70,12 +70,14 @@ dependencies = [
]
dynamic = [ "version" ]

# https://docs.pypi.org/project_metadata/#project-urls
[project.urls]
Documentation = "https://scanpy.readthedocs.io/"
Source = "https://github.com/scverse/scanpy"
Home-page = "https://scanpy.org"
Homepage = "https://scanpy.org"
Discourse = "https://discourse.scverse.org/c/help/scanpy/37"
Twitter = "https://twitter.com/scverse_team"
Bluesky = "https://bsky.app/profile/scverse.bsky.social"
Twitter = "https://x.com/scverse_team"

[project.scripts]
scanpy = "scanpy.cli:console_main"
Expand Down
2 changes: 1 addition & 1 deletion src/scanpy/preprocessing/_simple.py
Original file line number Diff line number Diff line change
Expand Up @@ -885,7 +885,7 @@ def sample(
Rows correspond to cells and columns to genes.
fraction
Sample to this `fraction` of the number of observations or variables.
(All of them, even if there are `0`s/`False`s in `p`.)
(All of them, even if there are `0`\\ s/`False`\\ s in `p`.)
This can be larger than 1.0, if `replace=True`.
See `axis` and `replace`.
n
Expand Down
11 changes: 5 additions & 6 deletions src/scanpy/tools/_rank_genes_groups.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@

from __future__ import annotations

from math import floor
from typing import TYPE_CHECKING, Literal

import numpy as np
Expand Down Expand Up @@ -32,6 +31,8 @@
# Used with get_literal_vals
_Method = Literal["logreg", "t-test", "wilcoxon", "t-test_overestim_var"]

_CONST_MAX_SIZE = 10000000


def _select_top_n(scores: NDArray, n_top: int):
n_from = scores.shape[0]
Expand All @@ -47,9 +48,7 @@ def _ranks(
X: np.ndarray | sparse.csr_matrix | sparse.csc_matrix,
mask_obs: NDArray[np.bool_] | None = None,
mask_obs_rest: NDArray[np.bool_] | None = None,
):
CONST_MAX_SIZE = 10000000

) -> Generator[tuple[pd.DataFrame, int, int], None, None]:
n_genes = X.shape[1]

if issparse(X):
Expand All @@ -71,7 +70,7 @@ def _ranks(
get_chunk = lambda X, left, right: adapt(X[:, left:right])

# Calculate chunk frames
max_chunk = floor(CONST_MAX_SIZE / n_cells)
max_chunk = max(_CONST_MAX_SIZE // n_cells, 1)

for left in range(0, n_genes, max_chunk):
right = min(left + max_chunk, n_genes)
Expand All @@ -81,7 +80,7 @@ def _ranks(
yield ranks, left, right


def _tiecorrect(ranks):
def _tiecorrect(ranks: pd.DataFrame) -> np.float64:
size = np.float64(ranks.shape[0])
if size < 2:
return np.repeat(ranks.shape[1], 1.0)
Expand Down
7 changes: 7 additions & 0 deletions tests/test_rank_genes_groups.py
Original file line number Diff line number Diff line change
Expand Up @@ -307,6 +307,13 @@ def test_wilcoxon_tie_correction(reference):
np.testing.assert_allclose(test_obj.stats[groups[0]]["pvals"], pvals)


def test_wilcoxon_huge_data(monkeypatch):
max_size = 300
adata = pbmc68k_reduced()
monkeypatch.setattr(sc.tl._rank_genes_groups, "_CONST_MAX_SIZE", max_size)
rank_genes_groups(adata, groupby="bulk_labels", method="wilcoxon")


@pytest.mark.parametrize(
("n_genes_add", "n_genes_out_add"),
[pytest.param(0, 0, id="equal"), pytest.param(2, 1, id="more")],
Expand Down

0 comments on commit 83515fc

Please sign in to comment.