Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: numpy.ndarray size changed when calling import hdbscan #457

Open
doctor3030 opened this issue Jan 31, 2021 · 76 comments
Open

ValueError: numpy.ndarray size changed when calling import hdbscan #457

doctor3030 opened this issue Jan 31, 2021 · 76 comments

Comments

@doctor3030
Copy link

When I try to import hdbscan I get following error:

`---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in
1 from sklearn.decomposition import PCA
2 import umap
----> 3 import hdbscan
4 from hyperopt import fmin, tpe, atpe, rand, hp, STATUS_OK, Trials, SparkTrials
5 import pickle

c:\program files\python37\lib\site-packages\hdbscan_init_.py in
----> 1 from .hdbscan_ import HDBSCAN, hdbscan
2 from .robust_single_linkage_ import RobustSingleLinkage, robust_single_linkage
3 from .validity import validity_index
4 from .prediction import approximate_predict, membership_vector, all_points_membership_vectors
5

c:\program files\python37\lib\site-packages\hdbscan\hdbscan_.py in
19 from scipy.sparse import csgraph
20
---> 21 from ._hdbscan_linkage import (single_linkage,
22 mst_linkage_core,
23 mst_linkage_core_vector,

hdbscan_hdbscan_linkage.pyx in init hdbscan._hdbscan_linkage()

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject`

I use:
python 3.7.9
numpy 1.19.3 (I also tried 1.19.5)

I would appreciate your help.

@omarsumadi
Copy link

omarsumadi commented Feb 1, 2021

Having this same exact issue as of Yesterday on Python3.8 any Numpy Version in the past Year

@Augusttell
Copy link

Also having this issue. Tried Numpy version 1.20 and 1.16.1

@paulthemagno
Copy link

The same with Python 3.7.9 in my case . Now it's working with Python 3.7.6 for me.

@omarsumadi
Copy link

omarsumadi commented Feb 1, 2021

I fixed it by installing the package with with pip install adding the flags --no-cache-dir --no-binary :all:
Apparently this allows your wheels to re-compile with your local version of Numpy.

I honestly have no idea why this is happening, in addition to other packages I use - perhaps someone re-compiled Cython scripts with and didn't make a changelog. I'm literally shooting completely blind here though.

@Augusttell
Copy link

Reompile also worked for me. Using public cloud that messes with compilation.

@omarsumadi
Copy link

omarsumadi commented Feb 1, 2021

Reompile also worked for me. Using public cloud that messes with compilation.

But does anyone know WHY this is actually happening? Especially on different projects as well outside of this repo?

@paulthemagno
Copy link

@omarsumadi can you explain me how to do that? I put the --no-cache-dir --no-binary :all: at the end of all my pip install lines but it didn't worked in Python 3.7.9.

@omarsumadi
Copy link

@paulthemagno Take a look at this stack overflow post: https://stackoverflow.com/questions/40845304/runtimewarning-numpy-dtype-size-changed-may-indicate-binary-incompatibility

Realistically, the only thing you would change would be: pip install hdbscan --no-cache-dir --no-binary :all:

If that doesn't work, I'm not sure. Try not setting a version of Numpy to install and letting Pip reconcile which Numpy should be installed if you are using multiple packages that rely on Numpy. Perhaps your issue is a bit deeper.

The way to actually solve all this though is to figure out why this happened in the first place.

@ymwdalex
Copy link

ymwdalex commented Feb 1, 2021

I use another package https://github.com/ing-bank/sparse_dot_topn with cython and numpy. And from today/yesterday, I got exactly the same error numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject.

My enviroment is aws/codebuild/amazonlinux2-x86_64-standard:3.0. I downgraded numpy version and it doesn't work.

pip install package --no-cache-dir --no-binary :all: fixed the problem. FYI.

@omarsumadi
Copy link

omarsumadi commented Feb 1, 2021

@ymwdalex That's actually the same package I came to this thread for. I don't have hbdscan installed, but came to help because I was trying to solve the sparse_dot_topn package issue.

To you, do you know why this is happening? I really don't want to have another go at fixing this bug again and having no idea where to start.

We could start by asking them. Or maybe scipy (a dependcy of both) decided to re-compile it's wheels to a different version of Numpy and everything broke?

@ymwdalex
Copy link

ymwdalex commented Feb 1, 2021

@omarsumadi thanks for the comments. I am the author of sparse_dot_topn. I didn't change the source code recently and have no idea why this happening...

@omarsumadi
Copy link

@ymwdalex Ok - that is kind of funny lol! By the way, hi! I love you work and everything that you have done the library is truly one of a kind and I have not found anything that comes close to its capabilities, which is sort of why I have a vested interest in seeing this through.

I'll spill to you wat I could figure out:

  • The only thing in common that both of these packages have is Numpy and Scipy
  • Scipy has a history of this happening in the past with other errors that are similar to this type. See - ValueError: numpy.ufunc size changed, may indicate binary incompatibility #272.
  • Numpy Versioning seems to have an impact on these errors and Scipy is consistently causing issues.
  • Someone at Scipy must have tried to re-compile with a later version of Numpy that perhaps broke something.

Again, this kind of thing is way outside of my comfort zone (I know nothing about Cython and Numpy cross-over), but perhaps we could find the version of Numpy that was used to compile the wheels and pin that as the version for your library?

Sorry if some of this doesn't make much sense.

@doctor3030
Copy link
Author

The same with Python 3.7.9 in my case . Now it's working with Python 3.7.6 for me.

I eventually installed python 3.7.6 and everything worked. However, I have another machine with 3.7.9 where everything works fine. So its not related to python version I think..

@omarsumadi
Copy link

omarsumadi commented Feb 2, 2021

@doctor3030 I'm not sure if you should close this, not until there's some better solution to other people's problems. I don't want to tell you how to do things and I most definitely respect your contributions, but I'd imagine this is definitely NOT solved especially since its pulling cross-package discussion.

I think there's a lot of cross interest figuring out what exactly happened as well. Unfortunately, I'm not well versed enough in Cython and Numpy internals to offer the correct solution other than to rebuild the wheels.

Thanks,
Omar

@doctor3030 doctor3030 reopened this Feb 2, 2021
@doctor3030
Copy link
Author

@doctor3030 I'm not sure if you should close this, not until there's some better solution to other people's problems. I don't want to tell you how to do things and I most definitely respect your contributions, but I'd image this is definitely NOT solved especially since its pulling cross-package discussion.

I think there's a lot of cross interest figuring out what exactly happened as well.

Thanks,
Omar

Ok, lets keep it open.

@omarsumadi
Copy link

Here's what I can say, apparently someone says Numpy 1.20.0 (probably what Scipy is compiled in due to some change that is now impacting all of us) according to the above (Trusted-AI/adversarial-robustness-toolbox#87).

What is most likely happening among us that is that we are using packages that limit Numpy installation version to something below 1.20.0 (such as Tensorflow).

Perhaps someone could verify the pull I linked?

@cavvia
Copy link

cavvia commented Feb 2, 2021

I have this issue when trying to use Top2Vec on Python 3.7.9, which pulls in Tensorflow and locks me to Numpy 1.19. Rebuilding HDBScan from source in turn fails on this Accelerate error, so I think I have to rebuild NumPy from source with OpenBLAS (although NumPy is otherwise working fine), which in turn is proving difficult.

So this is still very much an issue for me, no doubt for some others too.

@paulthemagno
Copy link

@cavvia the same with a similar library: BERTopic to me! I tried also with pip install package --no-cache-dir --no-binary :all: but doesn't change anything. But in my case the problem occurs in a Python 3.7.9 while with Python 3.7.6 it works well.

@AltfunsMA
Copy link

I can report the same issue as @cavvia after trying to use top2vec on 3.8.0 and on 3.7.5... encountering issues with UMAP when trying to work around it...

@x1s
Copy link

x1s commented Feb 2, 2021

Hello guys, we're facing the same issue here since this last weekend with no changes on the code or any library versions.

Isolating it to check what could have been happening

Dockerfile

FROM python:3.7-slim-buster
RUN apt-get update \
    && apt-get install -y --no-install-recommends python3.7-dev=3.7.3-2+deb10u2 build-essential=12.6 jq=1.5+dfsg-2+b1 curl=7.64.0-4+deb10u1 \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/* \
    && pip install --upgrade pip
COPY . .
RUN python -m pip install --user -r requirements.txt
CMD ["python", "-m", "test.py"]

requirements.txt

hdbscan==0.8.26
numpy==1.18.5

test.py

import hdbscan

print("hello")

outputs

$ docker run 9523faa77267 python test.py
Traceback (most recent call last):
  File "test.py", line 1, in <module>
    import hdbscan
  File "/home/someuser/.local/lib/python3.7/site-packages/hdbscan/__init__.py", line 1, in <module>
    from .hdbscan_ import HDBSCAN, hdbscan
  File "/home/someuser/.local/lib/python3.7/site-packages/hdbscan/hdbscan_.py", line 21, in <module>
    from ._hdbscan_linkage import (single_linkage,
  File "hdbscan/_hdbscan_linkage.pyx", line 1, in init hdbscan._hdbscan_linkage
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

It works with numpy==1.20 tough.

The point is, as mentioned here before, we use tensorflow on our project and we're locked by it on numpy<1.19.

I'm new on the python/pypi world, but I assumed that built wheels couldn't be updated (recompiled with updated libraries/dependencies) and if a updated was needed, a new release would be drafted with a minor change.

Is there anything else we can help with? I couldn't get exactly which lib was recompiled (hdbscan or scipy?) but noticed a difference on the checksum/size for the hdbscan on different builds but not sure it's related.

# last week (when everything worked)
Created wheel for hdbscan: filename=hdbscan-0.8.26-cp37-cp37m-linux_x86_64.whl size=687506 sha256=bd8b0c65d14ffa1d804f4a3df445fc4300452968a2372d581f0bb64963a8010d
# yesterday (when the error started happening)
Created wheel for hdbscan: filename=hdbscan-0.8.26-cp37-cp37m-linux_x86_64.whl size=686485 sha256=05668339290a597a871ee90da2b50a7ca415f18b82dba59ad6c08bb9b5b9192f

@ymwdalex
Copy link

ymwdalex commented Feb 2, 2021

@omarsumadi Thanks a lot for your investigation. I also open an issue in sparse_dot_topn package to refer this issue.

numpy 1.20.0 works for me.

In my environment which has problem, I installed numpy==1.19 first, then install sport_dot_topn, which use the latest cython and scipy (https://github.com/ing-bank/sparse_dot_topn/blob/master/setup.py#L70). Probably the latest cython or scipy has some update with incompatible with numpy version before 1.20.

@rajatkumarraghuvanshi1
Copy link

Make sure that you use correct and compatible version of libs .

annoy==1.17.0
cython==0.29.21
fuzzywuzzy==0.18.0
hdbscan==0.8.26
joblib==1.0.0
kiwisolver==1.3.1
llvmlite==0.35.0
matplotlib==3.3.2
numba==0.52.0
numpy==1.20.0
pandas==1.1.2
pillow==8.1.0
pyarrow==1.0.1
python-levenshtein==0.12.1
pytz==2021.1
scikit-learn==0.24.1
scipy==1.6.0
six==1.15.0
threadpoolctl==2.1.0
tqdm==4.50.0
umap-learn==0.5.0

@omarsumadi
Copy link

omarsumadi commented Feb 2, 2021

@omarsumadi Thanks a lot for your investigation. I also open an issue in sparse_dot_topn package to refer this issue.

numpy 1.20.0 works for me.

In my environment which has problem, I installed numpy==1.19 first, then install sport_dot_topn, which use the latest cython and scipy (https://github.com/ing-bank/sparse_dot_topn/blob/master/setup.py#L70). Probably the latest cython or scipy has some update with incompatible with numpy version before 1.20.

@ymwdalex
Alternative is to (downgrade Scipy as well and keep the current Numpy version) or (install with no binary :all:). The problem is I stand to bet a lot of people are going to probably use some other Pip Package that doesn't support Numpy 1.20.0 (big hint to Tensorflow) (especially since the new version number represents a step up so many people may have < 1.20.0 in their setups.

@lmcinnes
Copy link
Collaborator

lmcinnes commented Feb 2, 2021

I admit that I am as much at a loss as everyone else here. In fact I have little understanding of the binary wheel infrastructure on PyPI. I have not provided any new packages or wheels for hdbscan recently (i.e. within the last many months), so if there is a change it was handled by some automated process. Compiling from source (and, in fact, re-cythonizing everything) is likely the best option, but that does not leave a great install option. Any assistance from anyone with more experience in packaging than me would be greatly appreciated.

@swang423
Copy link

swang423 commented Jan 5, 2022

@sgbaird If numba is the only thing that's bothering you, try downgrade numba to 0.53 first, then upgrade numpy to 1.22.0.
https://stackoverflow.com/questions/70148065/numba-needs-numpy-1-20-or-less-for-shapley-import

@sgbaird
Copy link

sgbaird commented Jan 5, 2022

@swang423 thank you! This did the trick to get my GitHub actions, pip-based pytest unit tests back up and running. pip install numba==0.53.* numpy==1.22.0

@eterna2
Copy link

eterna2 commented Jan 6, 2022

@MaartenGr

I have the same issue. And numpy=1.22.0 is causing a bug with umap when u are using cosine distance. So now if hdbscan is working, umap is not. If umap is working, i cannot get hdbscan to work.

lmcinnes/pynndescent#163

:(

@RajamannarAanjaram
Copy link

I face the same issue.

I tired installing using --no-cache-dir --no-binary :all: --no-build-isolation, project.toml as well but still getting the same error.

python -V ==3.8.10
numpy==1.22.0
umap-learn==0.5.1
hdbscan==0.8.27

but for some wierd reason when I install these packages using conda install command I'm not getting these error, but this fails on pip install. Only difference is numpy version(1.20.3).

@juanroesel
Copy link

@sgbaird @swang423 @MaartenGr Thanks for sharing all your inputs! This seems to have done for me also pip install numba==0.53.* numpy==1.22.0 when trying to import BERTopic inside a Jupyter Notebook instance. Topic models are training just fine now.

bertopic                  0.9.4                    pypi_0
hdbscan                   0.8.27                   pypi_0
numba                     0.53.0                   pypi_0
numpy                     1.22.0                   pypi_0 
pip                       21.2.4           py39hecd8cb5_0 
python                    3.9.7                h88f2d9e_1
pyyaml                    5.4.1                    pypi_0
toml                      0.10.2                   pypi_0
umap-learn                0.5.2                    pypi_0

@ShorelLee
Copy link

I also have this problem.
When I got import hdbscan into the script and try to run the python script, I get the following error:
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
I did some experiments, but found that this seems to be a problem with the hdbscan package itself, and has nothing to do with the version of numpy.
If you used the command pip install hdbscan to install the hdbscan package in your virtual environment, please uninstall it, and then try to use the command conda install -c conda-forge hdbscan to reinstall hdbscan.
Hope this one can solve your problem!

@MaartenGr
Copy link

The issue turned out to be a fair bit less complex than I had thought 😅 The pypi release does not have yet the oldest-supported-numpy in its pyproject.toml. It seems that the master branch does have that fix, so simply using hdbscan from the master branch fixes the issue for me.

@lmcinnes Sorry to tag you like this but it seems that the issue should be solved whenever a new pip version is released. Fortunately, this also means that after that release we will not likely see this issue popping up anymore.

@BhujayKumarBhatta
Copy link

I faced the same issue while working on an anaconda. Then I came out from the conda environment and created a simple venv with python 3.9.7. installed hdbscan using pip , generated the requirements file. I created a fresh conda env and installed hdbscan with the required file. I am able to use it now.

pelog39) u1@ubuntu:$ cat hdbscan_requirement.txt
Cython==0.29.26
hdbscan==0.8.27
joblib==1.1.0
numpy==1.22.0
scikit-learn==1.0.2
scipy==1.7.3
six==1.16.0
threadpoolctl==3.0.0
pip install -r hdbscan_requirement.txt
pelog39) u1@ubuntu:
$ python -c 'import hdbscan'
(pelog39) u1@ubuntu:~$

for hdbscan to work with pytorch :
conda install -c conda-forge hdbscan
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch

@tocom242242
Copy link

In my env, numpy==1.21.5 works

fbunt added a commit to UM-RMRS/raster_tools that referenced this issue Jan 21, 2022
Pip installs the deps as specified in the requirements file. When the
cython modules are built, however, pip installs the latest version of
numpy, ignoring the specified version, and then builds against it.
This creates invalid shared objects that can't be used. This would not
be an issue normally but for a confluance of circumstances that expose
the bug in pip. These are: Numpy recently changed the C API and
Numba is incompatable with numpy > 1.21. Below are reference links.

pypa/pip#9542
scikit-learn-contrib/hdbscan#457 (comment)
https://stackoverflow.com/questions/66060487/valueerror-numpy-ndarray-size-changed-may-indicate-binary-incompatibility-exp
asross added a commit to pyqg/pyqg that referenced this issue Mar 5, 2022
Setuptools needs to use the proper version of numpy (which now must be
>1.20) while building the project, or we'll get C errors on import.

For more details, see:
- scikit-learn-contrib/hdbscan#457 (comment)
- https://stackoverflow.com/questions/66060487/valueerror-numpy-ndarray-size-changed-may-indicate-binary-incompatibility-exp
rabernat pushed a commit to pyqg/pyqg that referenced this issue Mar 23, 2022
* First pass at versioneer -> setuptools_scm

* Fix thorny numpy/Cython build requirement issues

Setuptools needs to use the proper version of numpy (which now must be
>1.20) while building the project, or we'll get C errors on import.

For more details, see:
- scikit-learn-contrib/hdbscan#457 (comment)
- https://stackoverflow.com/questions/66060487/valueerror-numpy-ndarray-size-changed-may-indicate-binary-incompatibility-exp

* Try adding pyfftw to build requirements

* update documentation

* Remove extra versioneer cruft

* Attempt to fix docs / python <3.8 compatibility

* Try bumping doc dependencies and explicitly requiring pyfftw

* Use pyfftw environment in CI

* Explicitly include importlib_metadata for compatibility older versions of python 3.x

* Update installation instructions to reflect pyfftw changes
@chaituValKanO
Copy link

chaituValKanO commented Apr 1, 2022

@swang423 thank you! This did the trick to get my GitHub actions, pip-based pytest unit tests back up and running. pip install numba==0.53.* numpy==1.22.0

This worked for me. Below is my env.yml (not in complete) (I had issue numba as well as someone mentioned above). Everything got fixed with below versions

  • pandas==1.2.4
  • numpy==1.22.0
  • numba==0.53.*
  • hdbscan==0.8.27
  • umap==0.1.1
  • umap_learn==0.5.1

copybara-service bot pushed a commit to tensorflow/privacy that referenced this issue Aug 16, 2022
* Updated the numpy version.
* Synced the pandas version.

In Python 3.10, if you invoke `pip install pandas~=1.1.4 numpy~=1.21.4` and then `import pandas` you get the following error:

```
>>> import pandas
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/venv/lib/python3.10/site-packages/pandas/__init__.py", line 30, in <module>
    from pandas._libs import hashtable as _hashtable, lib as _lib, tslib as _tslib
  File "/tmp/venv/lib/python3.10/site-packages/pandas/_libs/__init__.py", line 13, in <module>
    from pandas._libs.interval import Interval
  File "pandas/_libs/interval.pyx", line 1, in init pandas._libs.interval
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
```

I believe that this is the cause of the issue scikit-learn-contrib/hdbscan#457 (comment)

PiperOrigin-RevId: 467785781
copybara-service bot pushed a commit to tensorflow/privacy that referenced this issue Aug 16, 2022
* Updated the numpy version.
* Synced the pandas version.

In Python 3.10, if you invoke `pip install pandas~=1.1.4 numpy~=1.21.4` and then `import pandas` you get the following error:

```
>>> import pandas
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/venv/lib/python3.10/site-packages/pandas/__init__.py", line 30, in <module>
    from pandas._libs import hashtable as _hashtable, lib as _lib, tslib as _tslib
  File "/tmp/venv/lib/python3.10/site-packages/pandas/_libs/__init__.py", line 13, in <module>
    from pandas._libs.interval import Interval
  File "pandas/_libs/interval.pyx", line 1, in init pandas._libs.interval
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
```

I believe that this is the cause of the issue scikit-learn-contrib/hdbscan#457 (comment)

PiperOrigin-RevId: 467952859
@thedatadecoder
Copy link

Downgrading to a suitable hdbscan version has helped me here. Use trial and error to find the appropriate version.
Following versions worked for me:
%pip install hdbscan==0.8.33
%pip install numpy==1.20.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests