Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError with sc.pp.highly_variable_genes with pandas 0.24 #450

Closed
fbnrst opened this issue Jan 28, 2019 · 13 comments
Closed

ValueError with sc.pp.highly_variable_genes with pandas 0.24 #450

fbnrst opened this issue Jan 28, 2019 · 13 comments

Comments

@fbnrst
Copy link
Contributor

fbnrst commented Jan 28, 2019

Minimal example:

import scanpy.api as sc
sc.logging.print_versions()

adata = sc.datasets.blobs()

sc.pp.highly_variable_genes(adata)

Output:

**scanpy==1.3.7 anndata==0.6.17 numpy==1.15.4 scipy==1.2.0 pandas==0.24.0 scikit-learn==0.20.2 statsmodels==0.9.0 python-igraph==0.7.1 louvain==0.6.1 

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-5d93fbf298b7> in <module>
      4 adata = sc.datasets.blobs()
      5 
----> 6 sc.pp.highly_variable_genes(adata)

~/miniconda3/envs/spols190117/lib/python3.6/site-packages/scanpy/preprocessing/highly_variable_genes.py in highly_variable_genes(adata, min_disp, max_disp, min_mean, max_mean, n_top_genes, n_bins, flavor, subset, inplace)
    115         # a normalized disperion of 1
    116         one_gene_per_bin = disp_std_bin.isnull()
--> 117         gen_indices = np.where(one_gene_per_bin[df['mean_bin']])[0].tolist()
    118         if len(gen_indices) > 0:
    119             logg.msg(

~/miniconda3/envs/spols190117/lib/python3.6/site-packages/pandas/core/series.py in __getitem__(self, key)
    909         Please use .at[] or .iat[] accessors.
    910 
--> 911         Parameters
    912         ----------
    913         index : label

~/miniconda3/envs/spols190117/lib/python3.6/site-packages/pandas/core/series.py in _get_with(self, key)
    951         -------
    952         series : Series
--> 953             If label is contained, will be reference to calling Series,
    954             otherwise a new object
    955         """

~/miniconda3/envs/spols190117/lib/python3.6/site-packages/pandas/core/series.py in reindex(self, index, **kwargs)

~/miniconda3/envs/spols190117/lib/python3.6/site-packages/pandas/core/generic.py in reindex(self, *args, **kwargs)
   4344 
   4345             elif not is_list_like(value):
-> 4346                 new_data = self._data.fillna(value=value, limit=limit,
   4347                                              inplace=inplace,
   4348                                              downcast=downcast)

~/miniconda3/envs/spols190117/lib/python3.6/site-packages/pandas/core/generic.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
   4357             return self._constructor(new_data).__finalize__(self)
   4358 
-> 4359     def ffill(self, axis=None, inplace=False, limit=None, downcast=None):
   4360         """
   4361         Synonym for :meth:`DataFrame.fillna(method='ffill') <DataFrame.fillna>`

~/miniconda3/envs/spols190117/lib/python3.6/site-packages/pandas/core/indexes/category.py in reindex(self, target, method, level, limit, tolerance)
    501         # in which case we are going to conform to the passed Categorical
    502         new_target = np.asarray(new_target)
--> 503         if is_categorical_dtype(target):
    504             new_target = target._shallow_copy(new_target, name=self.name)
    505         else:

ValueError: cannot reindex with a non-unique indexer

**

The error is gone with pandas 0.23.4. There was a change in the API of reindex in pandas: http://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.24.0.html

@jipeifeng
Copy link

jipeifeng commented Jan 29, 2019

I have also got the same problem. And I have fixed the issue by reinstalling pandas to 0.23

@maximilianh
Copy link
Contributor

Same problem here. So glad that I found this ticket. From flying-sheep's commit, it looks like either upgrading scanpy to the newest version or downgrading pandas would work. There is also some anndata version requirement going up, no idea why.

@flying-sheep
Copy link
Member

because part of the fix is in anndata, of course!

@falexwolf
Copy link
Member

Thank you so much for fixing this, @flying-sheep!

@flying-sheep
Copy link
Member

flying-sheep commented Feb 4, 2019

Sure! @ivirshup figured out independently within 2 hours of me that is_string_dtype now works differently: scverse/anndata#107

The fix needed three parts:

  1. I fixed the tests to actually work (they were broken since forever because they used a hardcoded file name instead of tmp_path, and therefore reused the same file)
  2. I pulled his changes, which covered the writing portion of the needed fixes
  3. I fixed the reading portion in scverse/anndata@4c81631
  4. I fixed the highly variable genes function that relied on a slightly different behavior of series in 0.23

@rjpbonnal
Copy link

rjpbonnal commented Feb 4, 2019

Dear All,
running the tutorial pbmc3k.ipynb

I get a similar error than above:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-23-ea8d9dc47463> in <module>
----> 1 sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)

~/jupyterminiconda3/envs/scanpy137/lib/python3.6/site-packages/scanpy/preprocessing/highly_variable_genes.py in highly_variable_genes(adata, min_disp, max_disp, min_mean, max_mean, n_top_genes, n_bins, flavor, subset, inplace)
    115         # a normalized disperion of 1
    116         one_gene_per_bin = disp_std_bin.isnull()
--> 117         gen_indices = np.where(one_gene_per_bin[df['mean_bin']])[0].tolist()
    118         if len(gen_indices) > 0:
    119             logg.msg(

~/jupyterminiconda3/envs/scanpy137/lib/python3.6/site-packages/pandas/core/series.py in __getitem__(self, key)
    909             key = check_bool_indexer(self.index, key)
    910 
--> 911         return self._get_with(key)
    912 
    913     def _get_with(self, key):

~/jupyterminiconda3/envs/scanpy137/lib/python3.6/site-packages/pandas/core/series.py in _get_with(self, key)
    951                 return self.loc[key]
    952 
--> 953             return self.reindex(key)
    954         except Exception:
    955             # [slice(0, 5, None)] will break if you convert to ndarray,

~/jupyterminiconda3/envs/scanpy137/lib/python3.6/site-packages/pandas/core/series.py in reindex(self, index, **kwargs)
   3732     @Appender(generic.NDFrame.reindex.__doc__)
   3733     def reindex(self, index=None, **kwargs):
-> 3734         return super(Series, self).reindex(index=index, **kwargs)
   3735 
   3736     def drop(self, labels=None, axis=0, index=None, columns=None,

~/jupyterminiconda3/envs/scanpy137/lib/python3.6/site-packages/pandas/core/generic.py in reindex(self, *args, **kwargs)
   4354         # perform the reindex on the axes
   4355         return self._reindex_axes(axes, level, limit, tolerance, method,
-> 4356                                   fill_value, copy).__finalize__(self)
   4357 
   4358     def _reindex_axes(self, axes, level, limit, tolerance, method, fill_value,

~/jupyterminiconda3/envs/scanpy137/lib/python3.6/site-packages/pandas/core/generic.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
   4367             ax = self._get_axis(a)
   4368             new_index, indexer = ax.reindex(labels, level=level, limit=limit,
-> 4369                                             tolerance=tolerance, method=method)
   4370 
   4371             axis = self._get_axis_number(a)

~/jupyterminiconda3/envs/scanpy137/lib/python3.6/site-packages/pandas/core/indexes/category.py in reindex(self, target, method, level, limit, tolerance)
    501         else:
    502             if not target.is_unique:
--> 503                 raise ValueError("cannot reindex with a non-unique indexer")
    504 
    505             indexer, missing = self.get_indexer_non_unique(np.array(target))

ValueError: cannot reindex with a non-unique indexer

These the packages I have installed for this analysis both conda and pip

@LuckyMD
Copy link
Contributor

LuckyMD commented Feb 4, 2019

Hi @helios,

You will have to install scanpy from github to use the fix for this. The latest release (1.3.7) does not yet include the fix.

@rjpbonnal
Copy link

Hi @LuckyMD,
thanks. I was doing exactly that. I can confirm that everything works as expected.

@falexwolf
Copy link
Member

@flying-sheep, so, we need 1.3.8 essentially, now, right?

@flying-sheep
Copy link
Member

Yes. Do you think scanpy is quality-controlled enough that we can cut new releases whenever we please? Else I’m not comfortable to just create a new tag from master and release it by myself.

@falexwolf
Copy link
Member

Good, yes, in the meanwhile, test coverage should be high enough. I can't think of any major hole anymore. Still, it would be nice to briefly coordinate for Scanpy; at least, still these days. But yes, in this case, please make release 1.3.8!

@flying-sheep
Copy link
Member

I did, and then I realized that we have the import scanpy as sc change and more features, so I called it 1.4. Btw: could you please add me as owner to scanpy and anndata on PyPI? then I can manage releases and delete files on PyPI.

@falexwolf
Copy link
Member

Wasn't entirely planned to have 1.4 exactly now, but, fine, we have some stuff already 😄 I briefly tweeted about it: https://twitter.com/falexwolf/status/1093140419997822978

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants