Hvg seurat v3 numba kernel #3017

Intron7 · 2024-04-22T12:00:40Z

Adds a numba kernel for seurat_v3 for sparse matrices. This kernel is a lot faster and more memory efficient. I doesn't copy nor promotes the matrix to 64-bit floats.

scverse-benchmark · 2024-04-22T12:03:32Z

Benchmark changes

Change	Before [`3ba3f46`]	After [`277c1bf`]	Ratio	Benchmark (Parameter)
-	9.09±1ms	6.90±0.06ms	0.76	preprocessing_counts.time_log1p('pbmc3k')

Comparison: https://github.com/scverse/scanpy/compare/3ba3f46b4e6e77e8c6f0551db9663822097b486a..277c1bfb0885234aa757d0fdaeaa9103eb8568e2
Last changed: Thu, 23 May 2024 12:59:15 +0000

More details: https://github.com/scverse/scanpy/pull/3017/checks?check_run_id=25329779518

codecov · 2024-04-22T12:15:44Z

Codecov Report

Attention: Patch coverage is 85.71429% with 1 line in your changes missing coverage. Please review.

Project coverage is 75.86%. Comparing base (3ba3f46) to head (277c1bf).
Report is 47 commits behind head on main.

Files with missing lines	Patch %	Lines
scanpy/preprocessing/_highly_variable_genes.py	85.71%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3017      +/-   ##
==========================================
- Coverage   75.87%   75.86%   -0.01%     
==========================================
  Files         110      110              
  Lines       12533    12533              
==========================================
- Hits         9509     9508       -1     
- Misses       3024     3025       +1

Files with missing lines	Coverage Δ
scanpy/preprocessing/_highly_variable_genes.py	`95.20% <85.71%> (-0.40%)`	⬇️

scanpy/preprocessing/_highly_variable_genes.py

Intron7 · 2024-05-17T11:59:39Z

~500k Cells.
New Version:
peak memory: 16117.61 MiB, increment: 7709.86 MiB
Wall time: 11 s
Old Version:
peak memory: 40093.42 MiB, increment: 31646.50 MiB
Wall time: 30.9 s

eroell · 2024-05-21T12:05:22Z

Indeed, ~3-7x faster for me & of course quite a bit more memory efficient (quickly checked with scalene).

Tests keep working (for seurat_v3/seurat_v3_paper somewhat tight numeric comparison with Seurat results), nice.

scanpy/preprocessing/_highly_variable_genes.py

ivirshup

Looks good. Kinda suprised how much faster this is.

You could probably get another big speed up by operating on the whole matrix at once, and not creating batch specific matrices.

But I think fine as is.

I added a couple points to address, but that's just naming and docs.

scanpy/preprocessing/_highly_variable_genes.py

docs/release-notes/1.10.2.md

ivirshup · 2024-05-21T23:48:44Z

Would also be nice to have a timing benchmark for this.

Co-authored-by: Isaac Virshup <[email protected]>

Intron7 · 2024-05-22T07:28:43Z

It is the whole matrix for each batch. It's just called batch_counts because I needed to make sure the format is csr

ivirshup

I was thinking more of getting rid of the whole for b in np.unique(batch_info) loop and doing the whole thing in two passes over the matrix.

But LGTM, and I think ready to merge once those tests get fixed

Intron7 · 2024-05-25T13:39:57Z

I don't know what makes the tests fail with dask in utils

Co-authored-by: Severin Dicks <[email protected]>

Intron7 added 4 commits April 22, 2024 13:15

add kernel

9409834

update to singlethreaded

c321e90

test multi

1cf4a17

final kernel

fa4c75c

Intron7 added this to the 1.10.2 milestone Apr 22, 2024

Intron7 linked an issue Apr 22, 2024 that may be closed by this pull request

Update Preprocessing functions with numba #3011

Closed

Intron7 added Area – Performance 🐌 benchmark labels Apr 22, 2024

Intron7 requested review from flying-sheep and ivirshup April 22, 2024 12:02

flying-sheep added 2 commits April 23, 2024 17:00

Merge branch 'main' into hvg-seurat_v3-update

7b90fc1

move clip out

c45d6f6

flying-sheep reviewed Apr 23, 2024

View reviewed changes

scanpy/preprocessing/_highly_variable_genes.py Outdated Show resolved Hide resolved

scanpy/preprocessing/_highly_variable_genes.py Outdated Show resolved Hide resolved

Zethson changed the title ~~Hvg seurat v3 update~~ Hvg seurat v3 multicore kernel Apr 28, 2024

Intron7 and others added 7 commits April 29, 2024 11:06

remove nnz

faff3ae

adds releasenote

edeb3b3

Merge branch 'main' into hvg-seurat_v3-update

7398914

Merge branch 'main' into hvg-seurat_v3-update

4d16e67

Merge branch 'main' into hvg-seurat_v3-update

304fa07

fix docs

baf7eaf

single threaded

69c5101

Intron7 changed the title ~~Hvg seurat v3 multicore kernel~~ Hvg seurat v3 numba kernel May 17, 2024

fix indexing

a8868af

Intron7 requested review from flying-sheep and eroell May 17, 2024 12:02

eroell approved these changes May 21, 2024

View reviewed changes

scanpy/preprocessing/_highly_variable_genes.py Outdated Show resolved Hide resolved

ivirshup reviewed May 21, 2024

View reviewed changes

scanpy/preprocessing/_highly_variable_genes.py Outdated Show resolved Hide resolved

docs/release-notes/1.10.2.md Outdated Show resolved Hide resolved

Update kernel name

af999eb

Co-authored-by: Isaac Virshup <[email protected]>

ivirshup approved these changes May 23, 2024

View reviewed changes

Intron7 and others added 3 commits May 23, 2024 14:34

update kernelname

142e5d7

update release note

b43194a

Merge branch 'main' into hvg-seurat_v3-update

277c1bf

Intron7 merged commit 5dc489d into main May 31, 2024
15 checks passed

Intron7 deleted the hvg-seurat_v3-update branch May 31, 2024 09:15

meeseeksmachine pushed a commit to meeseeksmachine/scanpy that referenced this pull request May 31, 2024

Backport PR scverse#3017: Hvg seurat v3 numba kernel

4973db1

meeseeksmachine mentioned this pull request May 31, 2024

Backport PR #3017 on branch 1.10.x (Hvg seurat v3 numba kernel) #3082

Merged

flying-sheep pushed a commit that referenced this pull request Jun 3, 2024

Backport PR #3017: Hvg seurat v3 numba kernel (#3082)

30aa230

Co-authored-by: Severin Dicks <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hvg seurat v3 numba kernel #3017

Hvg seurat v3 numba kernel #3017

Intron7 commented Apr 22, 2024 •

edited

Loading

scverse-benchmark bot commented Apr 22, 2024 •

edited

Loading

codecov bot commented Apr 22, 2024 •

edited

Loading

Intron7 commented May 17, 2024

eroell commented May 21, 2024

ivirshup left a comment

ivirshup commented May 21, 2024

Intron7 commented May 22, 2024 •

edited

Loading

ivirshup left a comment

Intron7 commented May 25, 2024

Hvg seurat v3 numba kernel #3017

Hvg seurat v3 numba kernel #3017

Conversation

Intron7 commented Apr 22, 2024 • edited Loading

scverse-benchmark bot commented Apr 22, 2024 • edited Loading

Benchmark changes

codecov bot commented Apr 22, 2024 • edited Loading

Codecov Report

Intron7 commented May 17, 2024

eroell commented May 21, 2024

ivirshup left a comment

Choose a reason for hiding this comment

ivirshup commented May 21, 2024

Intron7 commented May 22, 2024 • edited Loading

ivirshup left a comment

Choose a reason for hiding this comment

Intron7 commented May 25, 2024

Intron7 commented Apr 22, 2024 •

edited

Loading

scverse-benchmark bot commented Apr 22, 2024 •

edited

Loading

codecov bot commented Apr 22, 2024 •

edited

Loading

Intron7 commented May 22, 2024 •

edited

Loading