ValueError with sc.pp.highly_variable_genes with pandas 0.24 #450

fbnrst · 2019-01-28T10:00:07Z

Minimal example:

import scanpy.api as sc
sc.logging.print_versions()

adata = sc.datasets.blobs()

sc.pp.highly_variable_genes(adata)

Output:

**scanpy==1.3.7 anndata==0.6.17 numpy==1.15.4 scipy==1.2.0 pandas==0.24.0 scikit-learn==0.20.2 statsmodels==0.9.0 python-igraph==0.7.1 louvain==0.6.1 

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-5d93fbf298b7> in <module>
      4 adata = sc.datasets.blobs()
      5 
----> 6 sc.pp.highly_variable_genes(adata)

~/miniconda3/envs/spols190117/lib/python3.6/site-packages/scanpy/preprocessing/highly_variable_genes.py in highly_variable_genes(adata, min_disp, max_disp, min_mean, max_mean, n_top_genes, n_bins, flavor, subset, inplace)
    115         # a normalized disperion of 1
    116         one_gene_per_bin = disp_std_bin.isnull()
--> 117         gen_indices = np.where(one_gene_per_bin[df['mean_bin']])[0].tolist()
    118         if len(gen_indices) > 0:
    119             logg.msg(

~/miniconda3/envs/spols190117/lib/python3.6/site-packages/pandas/core/series.py in __getitem__(self, key)
    909         Please use .at[] or .iat[] accessors.
    910 
--> 911         Parameters
    912         ----------
    913         index : label

~/miniconda3/envs/spols190117/lib/python3.6/site-packages/pandas/core/series.py in _get_with(self, key)
    951         -------
    952         series : Series
--> 953             If label is contained, will be reference to calling Series,
    954             otherwise a new object
    955         """

~/miniconda3/envs/spols190117/lib/python3.6/site-packages/pandas/core/series.py in reindex(self, index, **kwargs)

~/miniconda3/envs/spols190117/lib/python3.6/site-packages/pandas/core/generic.py in reindex(self, *args, **kwargs)
   4344 
   4345             elif not is_list_like(value):
-> 4346                 new_data = self._data.fillna(value=value, limit=limit,
   4347                                              inplace=inplace,
   4348                                              downcast=downcast)

~/miniconda3/envs/spols190117/lib/python3.6/site-packages/pandas/core/generic.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
   4357             return self._constructor(new_data).__finalize__(self)
   4358 
-> 4359     def ffill(self, axis=None, inplace=False, limit=None, downcast=None):
   4360         """
   4361         Synonym for :meth:`DataFrame.fillna(method='ffill') <DataFrame.fillna>`

~/miniconda3/envs/spols190117/lib/python3.6/site-packages/pandas/core/indexes/category.py in reindex(self, target, method, level, limit, tolerance)
    501         # in which case we are going to conform to the passed Categorical
    502         new_target = np.asarray(new_target)
--> 503         if is_categorical_dtype(target):
    504             new_target = target._shallow_copy(new_target, name=self.name)
    505         else:

ValueError: cannot reindex with a non-unique indexer

**

The error is gone with pandas 0.23.4. There was a change in the API of reindex in pandas: http://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.24.0.html

jipeifeng · 2019-01-29T07:33:15Z

I have also got the same problem. And I have fixed the issue by reinstalling pandas to 0.23

maximilianh · 2019-01-30T14:01:42Z

Same problem here. So glad that I found this ticket. From flying-sheep's commit, it looks like either upgrading scanpy to the newest version or downgrading pandas would work. There is also some anndata version requirement going up, no idea why.

flying-sheep · 2019-01-30T15:54:01Z

because part of the fix is in anndata, of course!

falexwolf · 2019-02-03T17:11:31Z

Thank you so much for fixing this, @flying-sheep!

flying-sheep · 2019-02-04T09:42:44Z

Sure! @ivirshup figured out independently within 2 hours of me that is_string_dtype now works differently: scverse/anndata#107

The fix needed three parts:

I fixed the tests to actually work (they were broken since forever because they used a hardcoded file name instead of tmp_path, and therefore reused the same file)
I pulled his changes, which covered the writing portion of the needed fixes
I fixed the reading portion in scverse/anndata@4c81631
I fixed the highly variable genes function that relied on a slightly different behavior of series in 0.23

rjpbonnal · 2019-02-04T16:01:12Z

Dear All,
running the tutorial pbmc3k.ipynb

I get a similar error than above:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-23-ea8d9dc47463> in <module>
----> 1 sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)

~/jupyterminiconda3/envs/scanpy137/lib/python3.6/site-packages/scanpy/preprocessing/highly_variable_genes.py in highly_variable_genes(adata, min_disp, max_disp, min_mean, max_mean, n_top_genes, n_bins, flavor, subset, inplace)
    115         # a normalized disperion of 1
    116         one_gene_per_bin = disp_std_bin.isnull()
--> 117         gen_indices = np.where(one_gene_per_bin[df['mean_bin']])[0].tolist()
    118         if len(gen_indices) > 0:
    119             logg.msg(

~/jupyterminiconda3/envs/scanpy137/lib/python3.6/site-packages/pandas/core/series.py in __getitem__(self, key)
    909             key = check_bool_indexer(self.index, key)
    910 
--> 911         return self._get_with(key)
    912 
    913     def _get_with(self, key):

~/jupyterminiconda3/envs/scanpy137/lib/python3.6/site-packages/pandas/core/series.py in _get_with(self, key)
    951                 return self.loc[key]
    952 
--> 953             return self.reindex(key)
    954         except Exception:
    955             # [slice(0, 5, None)] will break if you convert to ndarray,

~/jupyterminiconda3/envs/scanpy137/lib/python3.6/site-packages/pandas/core/series.py in reindex(self, index, **kwargs)
   3732     @Appender(generic.NDFrame.reindex.__doc__)
   3733     def reindex(self, index=None, **kwargs):
-> 3734         return super(Series, self).reindex(index=index, **kwargs)
   3735 
   3736     def drop(self, labels=None, axis=0, index=None, columns=None,

~/jupyterminiconda3/envs/scanpy137/lib/python3.6/site-packages/pandas/core/generic.py in reindex(self, *args, **kwargs)
   4354         # perform the reindex on the axes
   4355         return self._reindex_axes(axes, level, limit, tolerance, method,
-> 4356                                   fill_value, copy).__finalize__(self)
   4357 
   4358     def _reindex_axes(self, axes, level, limit, tolerance, method, fill_value,

~/jupyterminiconda3/envs/scanpy137/lib/python3.6/site-packages/pandas/core/generic.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
   4367             ax = self._get_axis(a)
   4368             new_index, indexer = ax.reindex(labels, level=level, limit=limit,
-> 4369                                             tolerance=tolerance, method=method)
   4370 
   4371             axis = self._get_axis_number(a)

~/jupyterminiconda3/envs/scanpy137/lib/python3.6/site-packages/pandas/core/indexes/category.py in reindex(self, target, method, level, limit, tolerance)
    501         else:
    502             if not target.is_unique:
--> 503                 raise ValueError("cannot reindex with a non-unique indexer")
    504 
    505             indexer, missing = self.get_indexer_non_unique(np.array(target))

ValueError: cannot reindex with a non-unique indexer

These the packages I have installed for this analysis both conda and pip

LuckyMD · 2019-02-04T16:09:22Z

Hi @helios,

You will have to install scanpy from github to use the fix for this. The latest release (1.3.7) does not yet include the fix.

rjpbonnal · 2019-02-04T16:19:55Z

Hi @LuckyMD,
thanks. I was doing exactly that. I can confirm that everything works as expected.

falexwolf · 2019-02-05T01:13:28Z

@flying-sheep, so, we need 1.3.8 essentially, now, right?

flying-sheep · 2019-02-05T08:59:14Z

Yes. Do you think scanpy is quality-controlled enough that we can cut new releases whenever we please? Else I’m not comfortable to just create a new tag from master and release it by myself.

falexwolf · 2019-02-05T12:06:25Z

Good, yes, in the meanwhile, test coverage should be high enough. I can't think of any major hole anymore. Still, it would be nice to briefly coordinate for Scanpy; at least, still these days. But yes, in this case, please make release 1.3.8!

flying-sheep · 2019-02-05T12:47:45Z

I did, and then I realized that we have the import scanpy as sc change and more features, so I called it 1.4. Btw: could you please add me as owner to scanpy and anndata on PyPI? then I can manage releases and delete files on PyPI.

falexwolf · 2019-02-06T16:43:18Z

Wasn't entirely planned to have 1.4 exactly now, but, fine, we have some stuff already 😄 I briefly tweeted about it: https://twitter.com/falexwolf/status/1093140419997822978

flying-sheep closed this as completed in 383a1b5 Jan 29, 2019

maximilianh mentioned this issue Jan 30, 2019

cbScanpy failing with error: "ValueError: cannot reindex with a non-unique indexer" maximilianh/cellBrowser#66

Closed

pinin4fjords mentioned this issue Feb 17, 2019

Scanpy scripts pandas fix bioconda/bioconda-recipes#13678

Merged

5 tasks

maximilianh mentioned this issue Feb 27, 2019

Bugs in v4.38 maximilianh/cellBrowser#73

Closed

Xparx pushed a commit to Xparx/scanpy that referenced this issue Jan 2, 2020

Fix highly variable genes. Fixes scverse#450

2819112

VolkerBergen mentioned this issue Jun 11, 2020

scv.utils.merge show less number of Cell barcode and change obs_names theislab/scvelo#197

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError with sc.pp.highly_variable_genes with pandas 0.24 #450

ValueError with sc.pp.highly_variable_genes with pandas 0.24 #450

fbnrst commented Jan 28, 2019

jipeifeng commented Jan 29, 2019 •

edited

Loading

maximilianh commented Jan 30, 2019

flying-sheep commented Jan 30, 2019

falexwolf commented Feb 3, 2019

flying-sheep commented Feb 4, 2019 •

edited

Loading

rjpbonnal commented Feb 4, 2019 •

edited

Loading

LuckyMD commented Feb 4, 2019

rjpbonnal commented Feb 4, 2019

falexwolf commented Feb 5, 2019

flying-sheep commented Feb 5, 2019

falexwolf commented Feb 5, 2019

flying-sheep commented Feb 5, 2019

falexwolf commented Feb 6, 2019

ValueError with sc.pp.highly_variable_genes with pandas 0.24 #450

ValueError with sc.pp.highly_variable_genes with pandas 0.24 #450

Comments

fbnrst commented Jan 28, 2019

jipeifeng commented Jan 29, 2019 • edited Loading

maximilianh commented Jan 30, 2019

flying-sheep commented Jan 30, 2019

falexwolf commented Feb 3, 2019

flying-sheep commented Feb 4, 2019 • edited Loading

rjpbonnal commented Feb 4, 2019 • edited Loading

LuckyMD commented Feb 4, 2019

rjpbonnal commented Feb 4, 2019

falexwolf commented Feb 5, 2019

flying-sheep commented Feb 5, 2019

falexwolf commented Feb 5, 2019

flying-sheep commented Feb 5, 2019

falexwolf commented Feb 6, 2019

jipeifeng commented Jan 29, 2019 •

edited

Loading

flying-sheep commented Feb 4, 2019 •

edited

Loading

rjpbonnal commented Feb 4, 2019 •

edited

Loading