cbScanpy failing with error: "ValueError: cannot reindex with a non-unique indexer" #66

matthewspeir · 2019-01-29T22:31:23Z

Command, cbScanpy output and error at the bottom:

$ cbScanpy -e ica_cord_blood_h5.h5 -o cbScanpyOut -n ICA_Cord_Blood
INFO:root:Creating cbScanpyOut
cbScanpy $Id$
Input file: ica_cord_blood_h5.h5
Start time: 2019-01-29 12:45:38.211369
scanpy==1.3.7 anndata==0.6.18 numpy==1.16.0 scipy==1.2.0 pandas==0.24.0 scikit-learn==0.20.2 statsmodels==0.9.0 
INFO:root:Loading expression matrix: 10X h5 format
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
INFO:root:Writing scanpy matrix to cbScanpyOut/exprMatrix.tsv.gz
INFO:root:Transposing matrix
INFO:root:Converting csc matrix to row-sparse matrix
INFO:root:Writing gene-by-gene, without using pandas
INFO:root:Writing 33694 genes in total
INFO:root:Wrote 0 genes
INFO:root:Wrote 2000 genes
INFO:root:Wrote 4000 genes
INFO:root:Wrote 6000 genes
INFO:root:Wrote 8000 genes
INFO:root:Wrote 10000 genes
INFO:root:Wrote 12000 genes
INFO:root:Wrote 14000 genes
INFO:root:Wrote 16000 genes
INFO:root:Wrote 18000 genes
INFO:root:Wrote 20000 genes
INFO:root:Wrote 22000 genes
INFO:root:Wrote 24000 genes
INFO:root:Wrote 26000 genes
INFO:root:Wrote 28000 genes
INFO:root:Wrote 30000 genes
INFO:root:Wrote 32000 genes
Data has 384000 samples/observations
Data has 33694 genes/variables
Basic filtering: keep only cells with min 200 genes
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Basic filtering: keep only gene with min 3 cells
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
After filtering: Data has 320644 samples/observations and 24248 genes/variables
INFO:root:'geneIdType' is not specified in config file.
INFO:root:Auto-detected gene IDs type: symbols
Remove cells with more than 0.050000 percent of mitochondrial genes
Computing percentage of mitochondrial genes
Remove cells with less than 10 and more than 15000 genes
Filtering cells
After filtering: Data has 273034 samples/observations and 24248 genes/variables
Expression normalization, counts per cell = 10000
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Finding highly variable genes: min_mean=0.012500, max_mean=3.000000, min_disp=0.500000
Traceback (most recent call last):
  File "/cluster/home/mspeir/ENV_cellbrowser/bin/cbScanpy", line 11, in <module>
    sys.exit(cbScanpyCli())
  File "/cluster/home/mspeir/ENV_cellbrowser/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 3630, in cbScanpyCli
    adata = cbScanpy(matrixFname, confFname, figDir, logFname, matrixOutFname)
  File "/cluster/home/mspeir/ENV_cellbrowser/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 3443, in cbScanpy
    filter_result = sc.pp.filter_genes_dispersion(adata.X, min_mean=minMean, max_mean=maxMean, min_disp=minDisp)
  File "/cluster/home/mspeir/ENV_cellbrowser/lib/python3.6/site-packages/scanpy/preprocessing/_deprecated/highly_variable_genes.py", line 131, in filter_genes_dispersion
    gen_indices = np.where(one_gene_per_bin[df['mean_bin']])[0].tolist()
  File "/cluster/home/mspeir/ENV_cellbrowser/lib/python3.6/site-packages/pandas/core/series.py", line 911, in __getitem__
    return self._get_with(key)
  File "/cluster/home/mspeir/ENV_cellbrowser/lib/python3.6/site-packages/pandas/core/series.py", line 953, in _get_with
    return self.reindex(key)
  File "/cluster/home/mspeir/ENV_cellbrowser/lib/python3.6/site-packages/pandas/core/series.py", line 3734, in reindex
    return super(Series, self).reindex(index=index, **kwargs)
  File "/cluster/home/mspeir/ENV_cellbrowser/lib/python3.6/site-packages/pandas/core/generic.py", line 4346, in reindex
    fill_value, copy).__finalize__(self)
  File "/cluster/home/mspeir/ENV_cellbrowser/lib/python3.6/site-packages/pandas/core/generic.py", line 4359, in _reindex_axes
    tolerance=tolerance, method=method)
  File "/cluster/home/mspeir/ENV_cellbrowser/lib/python3.6/site-packages/pandas/core/indexes/category.py", line 503, in reindex
    raise ValueError("cannot reindex with a non-unique indexer")
ValueError: cannot reindex with a non-unique indexer

This is using the 'Raw Counts Matrix - Cord Blood' h5 file is from 'Census of Immune Cells' data set here: https://preview.data.humancellatlas.org/.

The text was updated successfully, but these errors were encountered:

maximilianh · 2019-01-30T14:03:54Z

Oh darn, I spent quite a while trying to track this down, then had lunch and only then had the idea of googling it. It's a known problem of your version combination. scverse/scanpy#450 It's working on my machine and I just saw that my version of pandas is pandas==0.22.0. This problem has just been fixed in Scanpy. So there are at least two options for you: - you can downgrade pandas with pip install pandas==0.23.0 - you can upgrade scanpy to the current master branch (git clone + python setup.py)

maximilianh · 2019-01-30T17:38:19Z

Closing this as it's another scanpy problem.

cotedivoir · 2020-01-29T00:45:50Z

got the same issue, tried to downgrade pandas as you suggested, still there:

cbScanpy -e filtered_gene_bc_matrices/hg19/matrix.mtx -o scanpyOut -n pbmc3k
INFO:root:Loading Scanpy libraries
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
INFO:get_version:dirname: Trying to get version of get_version from dirname /home/cotedivoir/.local/lib/python3.6/site-packages
INFO:get_version:dirname: Failed; Does not match re.compile('get[-]version-([\d.]+?)(?:\.dev(\d+))?(?:_+-)?$')
INFO:get_version:git: Trying to get version from git in directory /home/cotedivoir/.local/lib/python3.6/site-packages
INFO:get_version:git: Failed; The top-level directory of the current Git repository is not the same as the root directory of the distribution
INFO:get_version:metadata: Trying to get version for get_version in dir /home/cotedivoir/.local/lib/python3.6/site-packages
INFO:get_version:metadata: Succeeded
INFO:get_version:dirname: Trying to get version of legacy_api_wrap from dirname /home/cotedivoir/.local/lib/python3.6/site-packages
INFO:get_version:dirname: Failed; Does not match re.compile('legacy[-]api[_-]wrap-([\d.]+?)(?:\.dev(\d+))?(?:_+-)?$')
INFO:get_version:git: Trying to get version from git in directory /home/cotedivoir/.local/lib/python3.6/site-packages
INFO:get_version:git: Failed; The top-level directory of the current Git repository is not the same as the root directory of the distribution
INFO:get_version:metadata: Trying to get version for legacy_api_wrap in dir /home/cotedivoir/.local/lib/python3.6/site-packages
INFO:get_version:metadata: Succeeded
INFO:root:cbScanpy $Id$
INFO:root:Input file: filtered_gene_bc_matrices/hg19/matrix.mtx
INFO:root:Restricting OPENBLAS to 4 threads
INFO:root:Start time: 2020-01-28 16:40:36.219846
scanpy==1.4.5.post2 anndata==0.7.1 umap==0.3.10 numpy==1.18.1 scipy==1.4.1 pandas==0.22.0 scikit-learn==0.22.1 statsmodels==0.11.0
INFO:root:Loading expression matrix: mtx format
INFO:root:Data has 2700 samples/observations
INFO:root:Data has 32738 genes/variables
INFO:root:Basic filtering: keep only cells with min 200 genes
Variable names are not unique. To make them unique, call .var_names_make_unique.
Variable names are not unique. To make them unique, call .var_names_make_unique.
INFO:root:Basic filtering: keep only gene with min 3 cells
Variable names are not unique. To make them unique, call .var_names_make_unique.
Variable names are not unique. To make them unique, call .var_names_make_unique.
INFO:root:After filtering: Data has 2700 samples/observations and 13714 genes/variables
INFO:root:'geneIdType' is not specified in config file or set to 'auto'.
INFO:root:Auto-detected gene IDs type: symbols
INFO:root:Remove cells with more than 0.050000 percent of mitochondrial genes
INFO:root:Computing percentage of mitochondrial genes
Traceback (most recent call last):
File "/home/cotedivoir/.local/bin/cbScanpy", line 11, in
sys.exit(cbScanpyCli())
File "/home/cotedivoir/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 5390, in cbScanpyCli
adata, params = cbScanpy(matrixFname, metaFname, inCluster, confFname, figDir, logFname)
File "/home/cotedivoir/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 5040, in cbScanpy
adata.obs['percent_mito'] = np.sum(adata[:, mito_genes].X, axis=1) / np.sum(adata.X, axis=1)
File "/home/cotedivoir/.local/lib/python3.6/site-packages/anndata/_core/anndata.py", line 1049, in getitem
oidx, vidx = self._normalize_indices(index)
File "/home/cotedivoir/.local/lib/python3.6/site-packages/anndata/_core/anndata.py", line 1030, in _normalize_indices
return _normalize_indices(index, self.obs_names, self.var_names)
File "/home/cotedivoir/.local/lib/python3.6/site-packages/anndata/_core/index.py", line 34, in _normalize_indices
ax1 = _normalize_index(ax1, names1)
File "/home/cotedivoir/.local/lib/python3.6/site-packages/anndata/_core/index.py", line 89, in _normalize_index
positions = index.get_indexer(indexer)
File "/home/cotedivoir/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2687, in get_indexer
raise InvalidIndexError('Reindexing only valid with uniquely'
pandas.core.indexes.base.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

maximilianh · 2020-01-29T15:14:14Z

It looks like this is scanpy==1.4.5.post2 anndata==0.7.1 umap==0.3.10 numpy==1.18.1 scipy==1.4.1 pandas==0.22.0 scikit-learn==0.22.1 statsmodels==0.11.0 How did you install sacnpy?

…

maximilianh · 2020-01-29T15:14:51Z

overall, it doesn't look like this is an issue with the cellbrowser, but rather with your scanpy installation? how did you install the cellbrowser and which version?

cotedivoir · 2020-01-29T17:11:22Z

Thank you for the reply!

version: cellbrowser (0.7.7)
how installed: pip3 install

maximilianh · 2020-01-29T20:57:48Z

Thanks! How did you install scanpy? With conda?

…

On Wed 29 Jan 2020 at 18:11, Anastasia ***@***.***> wrote: Thank you for the reply! version: cellbrowser (0.7.7) how installed: pip3 install — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#66?email_source=notifications&email_token=AACL4TIBQ6ZK7WO5SE27UYTRAG2DVA5CNFSM4GTEI3K2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKH7YSY#issuecomment-579861579>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACL4TN7HGM5SRIKNE5CKSTRAG2DVANCNFSM4GTEI3KQ> .

cotedivoir · 2020-01-29T23:36:00Z

no, same way - pip3

Thanks! How did you install scanpy? With conda?
…
On Wed 29 Jan 2020 at 18:11, Anastasia @.***> wrote: Thank you for the reply! version: cellbrowser (0.7.7) how installed: pip3 install — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#66?email_source=notifications&email_token=AACL4TIBQ6ZK7WO5SE27UYTRAG2DVA5CNFSM4GTEI3K2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKH7YSY#issuecomment-579861579>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TN7HGM5SRIKNE5CKSTRAG2DVANCNFSM4GTEI3KQ .

maximilianh · 2020-01-30T14:22:20Z

Hm, I can't reproduce this... is this on OSX or Linux? If yes, which linux version? This doesn't seem to be a cellbrowser problem, but something with your scanpy install... did you ask the scanpy people?

…

On Thu, Jan 30, 2020 at 12:36 AM Anastasia ***@***.***> wrote: no, same way - pip3 Thanks! How did you install scanpy? With conda? … <#m_1705389014784748423_> On Wed 29 Jan 2020 at 18:11, Anastasia *@*.***> wrote: Thank you for the reply! version: cellbrowser (0.7.7) how installed: pip3 install — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#66 <#66>?email_source=notifications&email_token=AACL4TIBQ6ZK7WO5SE27UYTRAG2DVA5CNFSM4GTEI3K2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKH7YSY#issuecomment-579861579>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TN7HGM5SRIKNE5CKSTRAG2DVANCNFSM4GTEI3KQ . — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#66?email_source=notifications&email_token=AACL4TLTJDFD4NAQXDGOPJDRAIHGDA5CNFSM4GTEI3K2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKJFBXQ#issuecomment-580014302>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACL4TIA2BBEGCYHATKI2GDRAIHGDANCNFSM4GTEI3KQ> .

cotedivoir · 2020-01-31T22:37:45Z

think i should have started with this: i use WSL for windows with Ubuntu distribution

maximilianh · 2020-02-01T22:47:19Z

Urgs. OK, I don't really want to dig into this. It's a problem with scanpy, partially due to them constantly making breaking changes. Can you ask them? I don't know about about scanpy. You can probably easily reproduce this problem by running a standard scanpy tutorial. Can I ask why you're using the scanpy pipeline of cellbrowser? Do you have your own results or is this part of a term project and cbScanpy has a ready-made pipeline for single cell analysis?

…

On Fri, Jan 31, 2020 at 11:37 PM Anastasia ***@***.***> wrote: think i should have started with this: i use WSL for windows with Ubuntu distribution — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#66?email_source=notifications&email_token=AACL4TI5ZLU4SQLK33WVBH3RASR3TA5CNFSM4GTEI3K2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKQHPUI#issuecomment-580941777>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACL4TO5SNSUNIKRXJH6TZ3RASR3TANCNFSM4GTEI3KQ> .

cotedivoir · 2020-02-02T01:28:14Z

ok, sorry for bothering, i'm just trying to learn more about single cell data analysis. not much structured information around. so i found cellbrowser, and as you say cbScanpy has already prebuild pipeline so i decided that would be a way to start

maximilianh · 2020-02-02T19:23:00Z

It usually is but only if your scanpy works. The recommended way to install scanpy is conda, it has a ton of dependencies.

…

On Sun 2 Feb 2020 at 02:28, Anastasia ***@***.***> wrote: ok, sorry for bothering, i'm just trying to learn more about single cell data analysis. not much structured information around. so i found cellbrowser, and as you say cbScanpy has already prebuild pipeline so i decided that would be a way to start — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#66?email_source=notifications&email_token=AACL4TIL3YXBFAUAFW4Z26DRAYOS5A5CNFSM4GTEI3K2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKRLALI#issuecomment-581087277>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACL4TMRGCHPOECMC7CPX5TRAYOS5ANCNFSM4GTEI3KQ> .

maximilianh · 2020-02-02T19:24:03Z

Oh that would be the reason by the way: install scanpy with conda! I don’t think the pip way is recommended anymore. Try conda as let me know if it works then. On Sun 2 Feb 2020 at 20:22, Maximilian Haeussler <[email protected]> wrote:

…

It usually is but only if your scanpy works. The recommended way to install scanpy is conda, it has a ton of dependencies. On Sun 2 Feb 2020 at 02:28, Anastasia ***@***.***> wrote: > ok, sorry for bothering, i'm just trying to learn more about single cell > data analysis. not much structured information around. so i found > cellbrowser, and as you say cbScanpy has already prebuild pipeline so i > decided that would be a way to start > > — > You are receiving this because you modified the open/close state. > Reply to this email directly, view it on GitHub > <#66?email_source=notifications&email_token=AACL4TIL3YXBFAUAFW4Z26DRAYOS5A5CNFSM4GTEI3K2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKRLALI#issuecomment-581087277>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AACL4TMRGCHPOECMC7CPX5TRAYOS5ANCNFSM4GTEI3KQ> > . >

cotedivoir · 2020-02-07T21:46:34Z

Hi! Installed scanpy via conda, still get the same error

matthewspeir · 2020-02-13T17:34:11Z

@ivirshup We think this might be an issue with scanpy (or maybe just their install of scanpy). Do you have insights into the issues this user is seeing?

maximilianh closed this as completed Jan 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cbScanpy failing with error: "ValueError: cannot reindex with a non-unique indexer" #66

cbScanpy failing with error: "ValueError: cannot reindex with a non-unique indexer" #66

matthewspeir commented Jan 29, 2019

maximilianh commented Jan 30, 2019 via email

maximilianh commented Jan 30, 2019

cotedivoir commented Jan 29, 2020 •

edited

Loading

maximilianh commented Jan 29, 2020 via email

maximilianh commented Jan 29, 2020 via email

cotedivoir commented Jan 29, 2020

maximilianh commented Jan 29, 2020 via email

cotedivoir commented Jan 29, 2020

maximilianh commented Jan 30, 2020 via email

cotedivoir commented Jan 31, 2020

maximilianh commented Feb 1, 2020 via email

cotedivoir commented Feb 2, 2020

maximilianh commented Feb 2, 2020 via email

maximilianh commented Feb 2, 2020 via email

cotedivoir commented Feb 7, 2020

matthewspeir commented Feb 13, 2020

cbScanpy failing with error: "ValueError: cannot reindex with a non-unique indexer" #66

cbScanpy failing with error: "ValueError: cannot reindex with a non-unique indexer" #66

Comments

matthewspeir commented Jan 29, 2019

maximilianh commented Jan 30, 2019 via email

maximilianh commented Jan 30, 2019

cotedivoir commented Jan 29, 2020 • edited Loading

maximilianh commented Jan 29, 2020 via email

maximilianh commented Jan 29, 2020 via email

cotedivoir commented Jan 29, 2020

maximilianh commented Jan 29, 2020 via email

cotedivoir commented Jan 29, 2020

maximilianh commented Jan 30, 2020 via email

cotedivoir commented Jan 31, 2020

maximilianh commented Feb 1, 2020 via email

cotedivoir commented Feb 2, 2020

maximilianh commented Feb 2, 2020 via email

maximilianh commented Feb 2, 2020 via email

cotedivoir commented Feb 7, 2020

matthewspeir commented Feb 13, 2020

cotedivoir commented Jan 29, 2020 •

edited

Loading