Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of genes in human immune cells dataset #24

Open
Yanay1 opened this issue Jul 21, 2023 · 1 comment
Open

Number of genes in human immune cells dataset #24

Yanay1 opened this issue Jul 21, 2023 · 1 comment

Comments

@Yanay1
Copy link

Yanay1 commented Jul 21, 2023

Hello,

In this file, https://github.com/theislab/scib-reproducibility/blob/main/notebooks/data_preprocessing/immune_cells/merging/Merging_all_human.ipynb

after merging the human only datasets together, the number of genes is only 12,003. This seems low to me, since each dataset has over 20,000 genes by itself. Do you know why this might be the case?

@LuckyMD
Copy link
Collaborator

LuckyMD commented Dec 20, 2023

Hi @Yanay1,

This is actually quite an expected number. The unfiltered gene expression matrix can have anywhere between 25k and 55k features depending on which version of the genome or transcriptome reads are aligned to. It's quite common that for the amount of datasets being merged in the immune task, the number of genes common to all datasets is around 10-15k. The other genes are probably not shared by all datsets. Note that when we merge we do an inner join, so that we don't have to give genes values of 0 counts when we actually don't know if this was true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants