Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scv.utils.merge show less number of Cell barcode and change obs_names #197

Closed
m21camby opened this issue May 25, 2020 · 14 comments
Closed
Labels
question Further information is requested

Comments

@m21camby
Copy link

m21camby commented May 25, 2020

Hi,

I have two samples data sets and ran velocyto each. I am trying to use scv.utils.merge(1st_adata, 2nd_adata) to merge two data sets. After scv.utils.merge, less number of cells and changed obs_names as well. It seems n_vars are kept while n_obs is changed. Is there any way to merge two sets without loss? (e.g. 1st set = (n_obs × n_vars = 9206 × 55421), 2nd set = (n_obs × n_vars = 8941 × 55421), merged set = (n_obs × n_vars = 18147 × 55421))

1st data

adata
AnnData object with n_obs × n_vars = 9206 × 55421
var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
layers: 'ambiguous', 'matrix', 'spliced', 'unspliced'

adata.obs_names
Index(['possorted_genome_bam_T851Q:AAACCCAAGGCTCTAT',
'possorted_genome_bam_T851Q:AAACCCAAGTGACACG',
.............
'possorted_genome_bam_T851Q:TTTGTTGTCGAACGCC'],
dtype='object', length=9206)`

2nd data

adata_2
AnnData object with n_obs × n_vars = 8941 × 55421
var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
layers: 'ambiguous', 'matrix', 'spliced', 'unspliced'

adata_2.obs_names
Index(['possorted_genome_bam_OB274:AAACCCAGTAGTCTGT',
'possorted_genome_bam_OB274:AAACCCAGTCGGCACT',
......
'possorted_genome_bam_OB274:TTTGTTGTCTCGCTCA'],
dtype='object', length=8941)'

merge two AnnData

merged_adata = scv.utils.merge(adata, adata_2)
merged_adata
AnnData object with n_obs × n_vars = 31 × 55421
obs: 'initial_size_unspliced', 'initial_size_spliced', 'initial_size'
var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
layers: 'ambiguous', 'matrix', 'spliced', 'unspliced''

check 2nd data obs names again

adata_2.obs_names
Index(['AAACCCAGTAGTCTGT', 'AAACCCAGTCGGCACT', 'AAACCCAGTTAGGCTT',
..............
'TTTGTTGTCTCGCTCA'], dtype='object', length=8941)'

@m21camby m21camby added the question Further information is requested label May 25, 2020
@VolkerBergen
Copy link
Contributor

In that case, you wouldn't want to merge but to concatenate the two datasets, which can be done with adata = adata1.concatenate(adata2).

@m21camby
Copy link
Author

m21camby commented May 27, 2020

Hi Volker,

I tried it and gave me below error message:

~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_indexer(self, target, method, limit, tolerance)
2983 if not self.is_unique:
2984 raise InvalidIndexError(
-> 2985 "Reindexing only valid with uniquely" " valued Index objects"
2986 )
2987

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Do you have any idea?

@VolkerBergen
Copy link
Contributor

I think you have to re-read the data, such that you don't have the "cleaned-up" obs_names, otherwise you get non-unique indexes that appear in both datasets. You find the non-unique indexes with adata1.obs_names.intersection(adata2.obs_names).

@m21camby
Copy link
Author

m21camby commented Jun 8, 2020

Thank you for the advice. I tried several ways to concatenate two data last days but still struggling.
When I implemented above code with re-read data, there is no intersect obs_names
adata.obs_names.intersection(adata_2.obs_names)
Index([], dtype='object')
adata.obs_names
Index(['possorted_genome_bam_T851Q:AAACCCAAGGCTCTAT',
'possorted_genome_bam_T851Q:AAACCCAAGTGACACG',.......
adata_023.obs_names
Index(['possorted_genome_bam_OB274:AAACCCAGTAGTCTGT',
'possorted_genome_bam_OB274:AAACCCAGTCGGCACT',.......
New_adata = adata.concatenate(adata_023)
InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Let me know if any other ways to do.
Thank you.

@VolkerBergen
Copy link
Contributor

obs_names need to be unique, which you can ensure with running adata.obs_names_make_unique() first. Then adata = adata1.concatenate(adata2) should work smoothly.

@m21camby
Copy link
Author

m21camby commented Jun 9, 2020

Thanks for the comment and I tried but didn't work. If I check below code
adata = scv.read('possorted_genome_bam_OB274.loom', cache=True)
adata
AnnData object with n_obs × n_vars = 9206 × 55421
adata.obs_names_make_unique()
adata
AnnData object with n_obs × n_vars = 9206 × 55421

As obs_names_make_unique didn't change n_obs, I think all obs_names are unique.

Thank you

@VolkerBergen
Copy link
Contributor

The number of observations doesn't change, those that aren't unique just get a suffix '-1'.

Do you still get the same error when trying to concatenate those two after making obs_names unique?

If so, I fear I can only help with the data in hand. Maybe you can share it (or better just the obs_names) to [email protected].

@KoichiHashikawa
Copy link

I also have the exact same issues with either scv.utils.merge or adata.concatenate.

I also did try the adata.obs_names_make_unique(). It seems to be universal problem.

@VolkerBergen
Copy link
Contributor

Thanks, can you please make sure you're running on the latest anndata and pandas?
pip install -U anndata pandas

@VolkerBergen
Copy link
Contributor

And set both adata.obs_names_make_unique() as well as adata.var_names_make_unique().

@VolkerBergen
Copy link
Contributor

Possibly related to scverse/scanpy#450.

@m21camby
Copy link
Author

I used anndata 0.7.3 and pandas 0.25.1. However, it worked after var_names_make_unique()

Thank you very much!

@KoichiHashikawa
Copy link

var_names_make_unique() solved the issue, thanks so much guys.

@VolkerBergen
Copy link
Contributor

Great to hear, that it got resolved. Will need to add some informative warning in anndata.concatenate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants