hvg flavors seurat and cellranger with batch: bug in subset #3042

eroell · 2024-05-02T14:25:04Z

Closes Unexplainable sc.pp.highly_variable_genes(subset = True) behavior #3027
Tests included or not required because:

Release notes not necessary because: they are, added

This PR fixes the bug reported in the linked issue.

A new test which spots the erroneous computations has been added.

I would use this chance to refactor the _highly_variable_genes.py, rather than using the 2-lines fix suggested in the first commit:
Doing the multi-batch hvg flagging differently for seurat_v3 and seurat/cell_ranger is what made this bug hard to spot in the first place I think.

codecov · 2024-05-02T14:43:00Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.31%. Comparing base (896e249) to head (13b1f6c).
Report is 50 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #3042   +/-   ##
=======================================
  Coverage   76.31%   76.31%           
=======================================
  Files         109      109           
  Lines       12513    12515    +2     
=======================================
+ Hits         9549     9551    +2     
  Misses       2964     2964

Files with missing lines	Coverage Δ
src/scanpy/preprocessing/_highly_variable_genes.py	`95.23% <100.00%> (+0.03%)`	⬆️

flying-sheep · 2024-05-03T11:09:54Z

Looks simple enough! Please deduplicate the tests though, they have too many identical lines.

eroell · 2024-05-21T06:30:07Z

Please deduplicate the tests though, they have too many identical lines.

To do so did across-setting tests with for loop on top of former test...

Would you prefer one separate test with that for loop for the across-settings check?

flying-sheep

Oh, sorry for missing this. I’m always for pytest.parametrize instead of for loops. That way it’s immediately visible which variants of the test fail.

eroell · 2024-06-27T07:08:54Z

To test in the future that this bug doesnt happen, the 4 combinations of inplace=True/False and subset=True/False need to be compared for their selected var_names - if these arguments were be in pytest.mark.parametrize, the different produced anndatas don't live at the same time and can't be compared...

Could

extract this specific section into a new, dedicated smaller test

    # check that the results are consistent for subset True/False: inplace True
    adata_post_subset = adatas["subset_False_inplace_True"][
        :, adatas["subset_False_inplace_True"].var["highly_variable"]
    ]
    assert adata_post_subset.var_names.equals(
        adatas["subset_True_inplace_True"].var_names
    )

    # check that the results are consistent for subset True/False: inplace False
    df_post_subset = dfs["subset_False_inplace_False"][
        dfs["subset_False_inplace_False"]["highly_variable"]
    ]
    assert df_post_subset.index.equals(dfs["subset_True_inplace_False"].index)

    # check that the results are consistent for inplace True/False: subset True
    assert adatas["subset_True_inplace_True"].var_names.equals(
        dfs["subset_True_inplace_False"].index
    )

hardcode the known-to-be-selected features and check against them

what's your preference here?

flying-sheep · 2024-06-27T13:26:13Z

Ah, silly me, this makes sense.

Since some entries of the dicts are never used, I just removed them and replaced the string with a bool (subset=True/False). This makes it all quite a bit more compact.

Also please remember itertools: If we’re not writing numba code, it’s always preferable to use it as opposed to nesting for loops.

flying-sheep

please add a relnote

docs/release-notes/1.10.3.md

eroell · 2024-06-27T15:46:40Z

Timeout and Scrublet failing in Python3.12?

eroell · 2024-06-27T15:48:40Z

Ah on main too I see

eroell · 2024-06-27T15:53:00Z

coverage decreased, I think not detected because some pytest.parametrize were removed?
I can also add the new specific test separately do avoid this, else it looks good to me

flying-sheep · 2024-06-28T08:26:47Z

coverage decreased, I think not detected because some pytest.parametrize were removed?

I think codecov just hadn’t updated its comment yet when you saw that.

What you say can’t be, it doesn’t matter how a line was hit: If a line is run, it’ll be reported as hit, if your changes would have caused it to no longer be it, it would have been reported as a miss.

…h: bug in subset

…in subset (#3128) Co-authored-by: Eljas Roellin <[email protected]>

hvg subset fix first commit

c53b227

eroell requested a review from flying-sheep May 2, 2024 14:49

eroell added 5 commits May 16, 2024 11:18

Merge branch 'main' into fix-hvg-subset

dde5d56

change style of test design

e9d7123

Merge branch 'main' into fix-hvg-subset

cd0179e

adjust test cases:

042ed39

added comment on test

c0b77e1

Merge branch 'main' into fix-hvg-subset

a1466e2

flying-sheep requested changes Jun 6, 2024

View reviewed changes

Merge branch 'main' into fix-hvg-subset

760a923

simplify

8c5d368

flying-sheep approved these changes Jun 27, 2024

View reviewed changes

Eljas added 2 commits June 27, 2024 16:23

add release note

fb91634

add release note

e2f0b66

flying-sheep reviewed Jun 27, 2024

View reviewed changes

docs/release-notes/1.10.3.md Outdated Show resolved Hide resolved

flying-sheep added this to the 1.10.3 milestone Jun 27, 2024

accidentally removed text

13b1f6c

eroell marked this pull request as ready for review June 27, 2024 16:45

flying-sheep merged commit fdfb9a1 into scverse:main Jun 28, 2024
11 of 14 checks passed

meeseeksmachine pushed a commit to meeseeksmachine/scanpy that referenced this pull request Jun 28, 2024

Backport PR scverse#3042: hvg flavors seurat and cellranger with batc…

21db5f9

…h: bug in subset

meeseeksmachine mentioned this pull request Jun 28, 2024

Backport PR #3042: hvg flavors seurat and cellranger with batch: bug in subset #3128

Merged

flying-sheep pushed a commit that referenced this pull request Jun 28, 2024

Backport PR #3042: hvg flavors seurat and cellranger with batch: bug …

4e5d903

…in subset (#3128) Co-authored-by: Eljas Roellin <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hvg flavors seurat and cellranger with batch: bug in subset #3042

hvg flavors seurat and cellranger with batch: bug in subset #3042

eroell commented May 2, 2024 •

edited

Loading

codecov bot commented May 2, 2024 •

edited

Loading

flying-sheep commented May 3, 2024

eroell commented May 21, 2024

flying-sheep left a comment

eroell commented Jun 27, 2024

flying-sheep commented Jun 27, 2024

flying-sheep left a comment

eroell commented Jun 27, 2024 •

edited

Loading

eroell commented Jun 27, 2024

eroell commented Jun 27, 2024

flying-sheep commented Jun 28, 2024

hvg flavors seurat and cellranger with batch: bug in subset #3042

hvg flavors seurat and cellranger with batch: bug in subset #3042

Conversation

eroell commented May 2, 2024 • edited Loading

codecov bot commented May 2, 2024 • edited Loading

Codecov Report

flying-sheep commented May 3, 2024

eroell commented May 21, 2024

flying-sheep left a comment

Choose a reason for hiding this comment

eroell commented Jun 27, 2024

flying-sheep commented Jun 27, 2024

flying-sheep left a comment

Choose a reason for hiding this comment

eroell commented Jun 27, 2024 • edited Loading

eroell commented Jun 27, 2024

eroell commented Jun 27, 2024

flying-sheep commented Jun 28, 2024

eroell commented May 2, 2024 •

edited

Loading

codecov bot commented May 2, 2024 •

edited

Loading

eroell commented Jun 27, 2024 •

edited

Loading