Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cbScanpy failing with error: "ValueError: cannot reindex with a non-unique indexer" #66

Closed
matthewspeir opened this issue Jan 29, 2019 · 16 comments

Comments

@matthewspeir
Copy link
Collaborator

Command, cbScanpy output and error at the bottom:

$ cbScanpy -e ica_cord_blood_h5.h5 -o cbScanpyOut -n ICA_Cord_Blood
INFO:root:Creating cbScanpyOut
cbScanpy $Id$
Input file: ica_cord_blood_h5.h5
Start time: 2019-01-29 12:45:38.211369
scanpy==1.3.7 anndata==0.6.18 numpy==1.16.0 scipy==1.2.0 pandas==0.24.0 scikit-learn==0.20.2 statsmodels==0.9.0 
INFO:root:Loading expression matrix: 10X h5 format
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
INFO:root:Writing scanpy matrix to cbScanpyOut/exprMatrix.tsv.gz
INFO:root:Transposing matrix
INFO:root:Converting csc matrix to row-sparse matrix
INFO:root:Writing gene-by-gene, without using pandas
INFO:root:Writing 33694 genes in total
INFO:root:Wrote 0 genes
INFO:root:Wrote 2000 genes
INFO:root:Wrote 4000 genes
INFO:root:Wrote 6000 genes
INFO:root:Wrote 8000 genes
INFO:root:Wrote 10000 genes
INFO:root:Wrote 12000 genes
INFO:root:Wrote 14000 genes
INFO:root:Wrote 16000 genes
INFO:root:Wrote 18000 genes
INFO:root:Wrote 20000 genes
INFO:root:Wrote 22000 genes
INFO:root:Wrote 24000 genes
INFO:root:Wrote 26000 genes
INFO:root:Wrote 28000 genes
INFO:root:Wrote 30000 genes
INFO:root:Wrote 32000 genes
Data has 384000 samples/observations
Data has 33694 genes/variables
Basic filtering: keep only cells with min 200 genes
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Basic filtering: keep only gene with min 3 cells
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
After filtering: Data has 320644 samples/observations and 24248 genes/variables
INFO:root:'geneIdType' is not specified in config file.
INFO:root:Auto-detected gene IDs type: symbols
Remove cells with more than 0.050000 percent of mitochondrial genes
Computing percentage of mitochondrial genes
Remove cells with less than 10 and more than 15000 genes
Filtering cells
After filtering: Data has 273034 samples/observations and 24248 genes/variables
Expression normalization, counts per cell = 10000
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Finding highly variable genes: min_mean=0.012500, max_mean=3.000000, min_disp=0.500000
Traceback (most recent call last):
  File "/cluster/home/mspeir/ENV_cellbrowser/bin/cbScanpy", line 11, in <module>
    sys.exit(cbScanpyCli())
  File "/cluster/home/mspeir/ENV_cellbrowser/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 3630, in cbScanpyCli
    adata = cbScanpy(matrixFname, confFname, figDir, logFname, matrixOutFname)
  File "/cluster/home/mspeir/ENV_cellbrowser/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 3443, in cbScanpy
    filter_result = sc.pp.filter_genes_dispersion(adata.X, min_mean=minMean, max_mean=maxMean, min_disp=minDisp)
  File "/cluster/home/mspeir/ENV_cellbrowser/lib/python3.6/site-packages/scanpy/preprocessing/_deprecated/highly_variable_genes.py", line 131, in filter_genes_dispersion
    gen_indices = np.where(one_gene_per_bin[df['mean_bin']])[0].tolist()
  File "/cluster/home/mspeir/ENV_cellbrowser/lib/python3.6/site-packages/pandas/core/series.py", line 911, in __getitem__
    return self._get_with(key)
  File "/cluster/home/mspeir/ENV_cellbrowser/lib/python3.6/site-packages/pandas/core/series.py", line 953, in _get_with
    return self.reindex(key)
  File "/cluster/home/mspeir/ENV_cellbrowser/lib/python3.6/site-packages/pandas/core/series.py", line 3734, in reindex
    return super(Series, self).reindex(index=index, **kwargs)
  File "/cluster/home/mspeir/ENV_cellbrowser/lib/python3.6/site-packages/pandas/core/generic.py", line 4346, in reindex
    fill_value, copy).__finalize__(self)
  File "/cluster/home/mspeir/ENV_cellbrowser/lib/python3.6/site-packages/pandas/core/generic.py", line 4359, in _reindex_axes
    tolerance=tolerance, method=method)
  File "/cluster/home/mspeir/ENV_cellbrowser/lib/python3.6/site-packages/pandas/core/indexes/category.py", line 503, in reindex
    raise ValueError("cannot reindex with a non-unique indexer")
ValueError: cannot reindex with a non-unique indexer

This is using the 'Raw Counts Matrix - Cord Blood' h5 file is from 'Census of Immune Cells' data set here: https://preview.data.humancellatlas.org/.

@maximilianh
Copy link
Owner

maximilianh commented Jan 30, 2019 via email

@maximilianh
Copy link
Owner

Closing this as it's another scanpy problem.

@cotedivoir
Copy link

cotedivoir commented Jan 29, 2020

got the same issue, tried to downgrade pandas as you suggested, still there:

cbScanpy -e filtered_gene_bc_matrices/hg19/matrix.mtx -o scanpyOut -n pbmc3k
INFO:root:Loading Scanpy libraries
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
INFO:get_version:dirname: Trying to get version of get_version from dirname /home/cotedivoir/.local/lib/python3.6/site-packages
INFO:get_version:dirname: Failed; Does not match re.compile('get[-]version-([\d.]+?)(?:\.dev(\d+))?(?:_+-)?$')
INFO:get_version:git: Trying to get version from git in directory /home/cotedivoir/.local/lib/python3.6/site-packages
INFO:get_version:git: Failed; The top-level directory of the current Git repository is not the same as the root directory of the distribution
INFO:get_version:metadata: Trying to get version for get_version in dir /home/cotedivoir/.local/lib/python3.6/site-packages
INFO:get_version:metadata: Succeeded
INFO:get_version:dirname: Trying to get version of legacy_api_wrap from dirname /home/cotedivoir/.local/lib/python3.6/site-packages
INFO:get_version:dirname: Failed; Does not match re.compile('legacy[
-]api[_-]wrap-([\d.]+?)(?:\.dev(\d+))?(?:_+-)?$')
INFO:get_version:git: Trying to get version from git in directory /home/cotedivoir/.local/lib/python3.6/site-packages
INFO:get_version:git: Failed; The top-level directory of the current Git repository is not the same as the root directory of the distribution
INFO:get_version:metadata: Trying to get version for legacy_api_wrap in dir /home/cotedivoir/.local/lib/python3.6/site-packages
INFO:get_version:metadata: Succeeded
INFO:root:cbScanpy $Id$
INFO:root:Input file: filtered_gene_bc_matrices/hg19/matrix.mtx
INFO:root:Restricting OPENBLAS to 4 threads
INFO:root:Start time: 2020-01-28 16:40:36.219846
scanpy==1.4.5.post2 anndata==0.7.1 umap==0.3.10 numpy==1.18.1 scipy==1.4.1 pandas==0.22.0 scikit-learn==0.22.1 statsmodels==0.11.0
INFO:root:Loading expression matrix: mtx format
INFO:root:Data has 2700 samples/observations
INFO:root:Data has 32738 genes/variables
INFO:root:Basic filtering: keep only cells with min 200 genes
Variable names are not unique. To make them unique, call .var_names_make_unique.
Variable names are not unique. To make them unique, call .var_names_make_unique.
INFO:root:Basic filtering: keep only gene with min 3 cells
Variable names are not unique. To make them unique, call .var_names_make_unique.
Variable names are not unique. To make them unique, call .var_names_make_unique.
INFO:root:After filtering: Data has 2700 samples/observations and 13714 genes/variables
INFO:root:'geneIdType' is not specified in config file or set to 'auto'.
INFO:root:Auto-detected gene IDs type: symbols
INFO:root:Remove cells with more than 0.050000 percent of mitochondrial genes
INFO:root:Computing percentage of mitochondrial genes
Traceback (most recent call last):
File "/home/cotedivoir/.local/bin/cbScanpy", line 11, in
sys.exit(cbScanpyCli())
File "/home/cotedivoir/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 5390, in cbScanpyCli
adata, params = cbScanpy(matrixFname, metaFname, inCluster, confFname, figDir, logFname)
File "/home/cotedivoir/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 5040, in cbScanpy
adata.obs['percent_mito'] = np.sum(adata[:, mito_genes].X, axis=1) / np.sum(adata.X, axis=1)
File "/home/cotedivoir/.local/lib/python3.6/site-packages/anndata/_core/anndata.py", line 1049, in getitem
oidx, vidx = self._normalize_indices(index)
File "/home/cotedivoir/.local/lib/python3.6/site-packages/anndata/_core/anndata.py", line 1030, in _normalize_indices
return _normalize_indices(index, self.obs_names, self.var_names)
File "/home/cotedivoir/.local/lib/python3.6/site-packages/anndata/_core/index.py", line 34, in _normalize_indices
ax1 = _normalize_index(ax1, names1)
File "/home/cotedivoir/.local/lib/python3.6/site-packages/anndata/_core/index.py", line 89, in _normalize_index
positions = index.get_indexer(indexer)
File "/home/cotedivoir/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2687, in get_indexer
raise InvalidIndexError('Reindexing only valid with uniquely'
pandas.core.indexes.base.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

@maximilianh
Copy link
Owner

maximilianh commented Jan 29, 2020 via email

@maximilianh
Copy link
Owner

maximilianh commented Jan 29, 2020 via email

@cotedivoir
Copy link

Thank you for the reply!

version: cellbrowser (0.7.7)
how installed: pip3 install

@maximilianh
Copy link
Owner

maximilianh commented Jan 29, 2020 via email

@cotedivoir
Copy link

no, same way - pip3

Thanks! How did you install scanpy? With conda?

On Wed 29 Jan 2020 at 18:11, Anastasia @.***> wrote: Thank you for the reply! version: cellbrowser (0.7.7) how installed: pip3 install — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#66?email_source=notifications&email_token=AACL4TIBQ6ZK7WO5SE27UYTRAG2DVA5CNFSM4GTEI3K2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKH7YSY#issuecomment-579861579>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TN7HGM5SRIKNE5CKSTRAG2DVANCNFSM4GTEI3KQ .

@maximilianh
Copy link
Owner

maximilianh commented Jan 30, 2020 via email

@cotedivoir
Copy link

think i should have started with this: i use WSL for windows with Ubuntu distribution

@maximilianh
Copy link
Owner

maximilianh commented Feb 1, 2020 via email

@cotedivoir
Copy link

ok, sorry for bothering, i'm just trying to learn more about single cell data analysis. not much structured information around. so i found cellbrowser, and as you say cbScanpy has already prebuild pipeline so i decided that would be a way to start

@maximilianh
Copy link
Owner

maximilianh commented Feb 2, 2020 via email

@maximilianh
Copy link
Owner

maximilianh commented Feb 2, 2020 via email

@cotedivoir
Copy link

Hi! Installed scanpy via conda, still get the same error

@matthewspeir
Copy link
Collaborator Author

@ivirshup We think this might be an issue with scanpy (or maybe just their install of scanpy). Do you have insights into the issues this user is seeing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants