scv.utils.merge show less number of Cell barcode and change obs_names #197

m21camby · 2020-05-25T12:27:48Z

Hi,

I have two samples data sets and ran velocyto each. I am trying to use scv.utils.merge(1st_adata, 2nd_adata) to merge two data sets. After scv.utils.merge, less number of cells and changed obs_names as well. It seems n_vars are kept while n_obs is changed. Is there any way to merge two sets without loss? (e.g. 1st set = (n_obs × n_vars = 9206 × 55421), 2nd set = (n_obs × n_vars = 8941 × 55421), merged set = (n_obs × n_vars = 18147 × 55421))

1st data

adata
AnnData object with n_obs × n_vars = 9206 × 55421
var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
layers: 'ambiguous', 'matrix', 'spliced', 'unspliced'

adata.obs_names
Index(['possorted_genome_bam_T851Q:AAACCCAAGGCTCTAT',
'possorted_genome_bam_T851Q:AAACCCAAGTGACACG',
.............
'possorted_genome_bam_T851Q:TTTGTTGTCGAACGCC'],
dtype='object', length=9206)`

2nd data

adata_2
AnnData object with n_obs × n_vars = 8941 × 55421
var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
layers: 'ambiguous', 'matrix', 'spliced', 'unspliced'

adata_2.obs_names
Index(['possorted_genome_bam_OB274:AAACCCAGTAGTCTGT',
'possorted_genome_bam_OB274:AAACCCAGTCGGCACT',
......
'possorted_genome_bam_OB274:TTTGTTGTCTCGCTCA'],
dtype='object', length=8941)'

merge two AnnData

merged_adata = scv.utils.merge(adata, adata_2)
merged_adata
AnnData object with n_obs × n_vars = 31 × 55421
obs: 'initial_size_unspliced', 'initial_size_spliced', 'initial_size'
var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
layers: 'ambiguous', 'matrix', 'spliced', 'unspliced''

check 2nd data obs names again

adata_2.obs_names
Index(['AAACCCAGTAGTCTGT', 'AAACCCAGTCGGCACT', 'AAACCCAGTTAGGCTT',
..............
'TTTGTTGTCTCGCTCA'], dtype='object', length=8941)'

The text was updated successfully, but these errors were encountered:

VolkerBergen · 2020-05-26T11:59:20Z

In that case, you wouldn't want to merge but to concatenate the two datasets, which can be done with adata = adata1.concatenate(adata2).

m21camby · 2020-05-27T14:57:54Z

Hi Volker,

I tried it and gave me below error message:

~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_indexer(self, target, method, limit, tolerance)
2983 if not self.is_unique:
2984 raise InvalidIndexError(
-> 2985 "Reindexing only valid with uniquely" " valued Index objects"
2986 )
2987

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Do you have any idea?

VolkerBergen · 2020-05-27T17:50:24Z

I think you have to re-read the data, such that you don't have the "cleaned-up" obs_names, otherwise you get non-unique indexes that appear in both datasets. You find the non-unique indexes with adata1.obs_names.intersection(adata2.obs_names).

m21camby · 2020-06-08T21:18:25Z

Thank you for the advice. I tried several ways to concatenate two data last days but still struggling.
When I implemented above code with re-read data, there is no intersect obs_names
adata.obs_names.intersection(adata_2.obs_names)
Index([], dtype='object')
adata.obs_names
Index(['possorted_genome_bam_T851Q:AAACCCAAGGCTCTAT',
'possorted_genome_bam_T851Q:AAACCCAAGTGACACG',.......
adata_023.obs_names
Index(['possorted_genome_bam_OB274:AAACCCAGTAGTCTGT',
'possorted_genome_bam_OB274:AAACCCAGTCGGCACT',.......
New_adata = adata.concatenate(adata_023)
InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Let me know if any other ways to do.
Thank you.

VolkerBergen · 2020-06-08T22:51:51Z

obs_names need to be unique, which you can ensure with running adata.obs_names_make_unique() first. Then adata = adata1.concatenate(adata2) should work smoothly.

m21camby · 2020-06-09T20:14:15Z

Thanks for the comment and I tried but didn't work. If I check below code
adata = scv.read('possorted_genome_bam_OB274.loom', cache=True)
adata
AnnData object with n_obs × n_vars = 9206 × 55421
adata.obs_names_make_unique()
adata
AnnData object with n_obs × n_vars = 9206 × 55421

As obs_names_make_unique didn't change n_obs, I think all obs_names are unique.

Thank you

VolkerBergen · 2020-06-09T20:40:31Z

The number of observations doesn't change, those that aren't unique just get a suffix '-1'.

Do you still get the same error when trying to concatenate those two after making obs_names unique?

If so, I fear I can only help with the data in hand. Maybe you can share it (or better just the obs_names) to [email protected].

KoichiHashikawa · 2020-06-11T00:16:54Z

I also have the exact same issues with either scv.utils.merge or adata.concatenate.

I also did try the adata.obs_names_make_unique(). It seems to be universal problem.

VolkerBergen · 2020-06-11T04:39:27Z

Thanks, can you please make sure you're running on the latest anndata and pandas?
pip install -U anndata pandas

VolkerBergen · 2020-06-11T04:40:53Z

And set both adata.obs_names_make_unique() as well as adata.var_names_make_unique().

VolkerBergen · 2020-06-11T04:51:43Z

Possibly related to scverse/scanpy#450.

m21camby · 2020-06-11T09:10:49Z

I used anndata 0.7.3 and pandas 0.25.1. However, it worked after var_names_make_unique()

Thank you very much!

KoichiHashikawa · 2020-06-11T20:47:27Z

var_names_make_unique() solved the issue, thanks so much guys.

VolkerBergen · 2020-06-11T22:10:01Z

Great to hear, that it got resolved. Will need to add some informative warning in anndata.concatenate.

m21camby added the question Further information is requested label May 25, 2020

m21camby closed this as completed Jun 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scv.utils.merge show less number of Cell barcode and change obs_names #197

scv.utils.merge show less number of Cell barcode and change obs_names #197

m21camby commented May 25, 2020 •

edited

Loading

VolkerBergen commented May 26, 2020

m21camby commented May 27, 2020 •

edited

Loading

VolkerBergen commented May 27, 2020

m21camby commented Jun 8, 2020

VolkerBergen commented Jun 8, 2020

m21camby commented Jun 9, 2020

VolkerBergen commented Jun 9, 2020

KoichiHashikawa commented Jun 11, 2020

VolkerBergen commented Jun 11, 2020

VolkerBergen commented Jun 11, 2020

VolkerBergen commented Jun 11, 2020

m21camby commented Jun 11, 2020

KoichiHashikawa commented Jun 11, 2020

VolkerBergen commented Jun 11, 2020

scv.utils.merge show less number of Cell barcode and change obs_names #197

scv.utils.merge show less number of Cell barcode and change obs_names #197

Comments

m21camby commented May 25, 2020 • edited Loading

1st data

2nd data

merge two AnnData

check 2nd data obs names again

VolkerBergen commented May 26, 2020

m21camby commented May 27, 2020 • edited Loading

VolkerBergen commented May 27, 2020

m21camby commented Jun 8, 2020

VolkerBergen commented Jun 8, 2020

m21camby commented Jun 9, 2020

VolkerBergen commented Jun 9, 2020

KoichiHashikawa commented Jun 11, 2020

VolkerBergen commented Jun 11, 2020

VolkerBergen commented Jun 11, 2020

VolkerBergen commented Jun 11, 2020

m21camby commented Jun 11, 2020

KoichiHashikawa commented Jun 11, 2020

VolkerBergen commented Jun 11, 2020

m21camby commented May 25, 2020 •

edited

Loading

m21camby commented May 27, 2020 •

edited

Loading