Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

numpy causing various errors #125

Closed
AltfunsMA opened this issue Feb 2, 2021 · 12 comments
Closed

numpy causing various errors #125

AltfunsMA opened this issue Feb 2, 2021 · 12 comments

Comments

@AltfunsMA
Copy link

AltfunsMA commented Feb 2, 2021

I've been having trouble with numpy when using Top2Vec version 1.0.20 with Python 3.8.0 on Ubuntu 18.04; I experience the same problems using Python 3.7.5. I've tried installing numpy 1.0.20, numpy 1.19.5.

see this issue for the hbsc error.

and this issue for the umap error.

UMAP

PicklingError:

(snip)

/data/.top2vec/lib/python3.8/site-packages/umap/umap_.py in fit(self, X, y)
   2571 
   2572         numba.set_num_threads(self._original_n_threads)
-> 2573         self._input_hash = joblib.hash(self._raw_data)
   2574 
   2575         return self

/data/.top2vec/lib/python3.8/site-packages/joblib/hashing.py in hash(obj, hash_name, coerce_mmap)
    259     else:
    260         hasher = Hasher(hash_name=hash_name)
--> 261     return hasher.hash(obj)

/data/.top2vec/lib/python3.8/site-packages/joblib/hashing.py in hash(self, obj, return_digest)
     61     def hash(self, obj, return_digest=True):
     62         try:
---> 63             self.dump(obj)
     64         except pickle.PicklingError as e:
     65             e.args += ('PicklingError while hashing %r: %r' % (obj, e),)

(snip)

PicklingError: ("Can't pickle <class 'numpy.dtype[float32]'>: it's not found as numpy.dtype[float32]", 'PicklingError while hashing array([[ 0.002187  , -0.00357572, -0.00279311, ...,  0.00120361,\n        -0.00115495,  0.00059189],\n       [-0.05823869,  0.01436491,  0.02220243, ...,  0.00703284,\n        -0.01716192, -0.01003473],\n       [-0.00334117,  0.00051066,  0.00269544, ...,  0.00070796,\n        -0.00202038, -0.00233051],\n       ...,\n       [ 0.00062888,  0.0027382 ,  0.0044361 , ..., -0.00229976,\n         0.00057765, -0.00033288],\n       [-0.00081269,  0.00099852, -0.00054314, ...,  0.00133646,\n        -0.00026089, -0.00150439],\n       [-0.01297437,  0.0104734 ,  0.01563089, ..., -0.00051685,\n        -0.00144138, -0.00556232]], dtype=float32): PicklingError("Can\'t pickle <class \'numpy.dtype[float32]\'>: it\'s not found as numpy.dtype[float32]")')

HDBSCAN

from top2vec import Top2Vec

(snip)

/data/.top2vec/lib/python3.8/site-packages/hdbscan/hdbscan_.py in <module>
     19 from scipy.sparse import csgraph
     20 
---> 21 from ._hdbscan_linkage import (single_linkage,
     22                                mst_linkage_core,
     23                                mst_linkage_core_vector,

hdbscan/_hdbscan_linkage.pyx in init hdbscan._hdbscan_linkage()

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

@ddangelov
Copy link
Owner

Make sure you have joblib<1.0.0. I would create a fresh environment and do pip install top2vec.

@yamengzhang
Copy link

I have exactly the same error

@AltfunsMA
Copy link
Author

AltfunsMA commented Feb 3, 2021

Make sure you have joblib<1.0.0. I would create a fresh environment and do pip install top2vec.

Thanks @ddangelov and sorry to bring this up here,. Not sure it's a real issue for your package; but I thought I'd leave it as others may end up here as well and could redirected to the 'real' issues.

All of the above was attempted in fresh venv environments each time, also avoiding binaries and cached packages to no avail. I never checked what joblib version was installed and it hasn't been mentioned elsewhere I don't think. Is there any particular reason to focus on that?

@ddangelov
Copy link
Owner

Make sure you have joblib<1.0.0. I would create a fresh environment and do pip install top2vec.

Thanks @ddangelov and sorry to bring this up here,. Not sure it's a real issue for your package; but I thought I'd leave it as others may end up here as well and could redirected to the 'real' issues.

All of the above was attempted in fresh venv environments each time, also avoiding binaries and cached packages to no avail. I never checked what joblib version was installed and it hasn't been mentioned elsewhere I don't think. Is there any particular reason to focus on that?

joblib=1.0.0 causes errors similar to the above. I will have to look into this. It seems to be a UMAP and HDBSCAN issue with numpy and joblib. It would be good to know UMAP and HDBSCAN versions for people experiencing this issue.

@ddangelov
Copy link
Owner

I just tried a fresh install of Top2Vec and it ran without errors. I had the following versions:
numpy===1.19.2
umap-learn==0.5.0
hdbscan==0.8.26
joblib==0.17.0

@schwabPhysics
Copy link

schwabPhysics commented Feb 3, 2021

I have the same error stemming from hdbscan while using jupyter lab. The versions of numpy, umap-learn, hdbscan, and joblib are identical to what is listed in :

I just tried a fresh install of Top2Vec and it ran without errors. I had the following versions:
numpy===1.19.2
umap-learn==0.5.0
hdbscan==0.8.26
joblib==0.17.0

@ddangelov
Copy link
Owner

I ran the above with Python 3.8.5, here are all the dependencies:

appnope==0.1.2
backcall==0.2.0
bleach==3.1.4
certifi==2020.12.5
colorama==0.4.3
cycler==0.10.0
Cython==0.29.21
decorator==4.4.2
docutils==0.16
gensim==3.8.3
hdbscan==0.8.26
ipykernel==5.3.4
ipython==7.20.0
ipython-genutils==0.2.0
jedi==0.17.0
joblib==0.17.0
jupyter-client==6.1.7
jupyter-core==4.7.1
keyring==21.2.0
kiwisolver==1.3.1
llvmlite==0.35.0
matplotlib==3.3.4
mkl-fft==1.2.0
mkl-random==1.1.1
mkl-service==2.3.0
numba==0.52.0
numpy==1.19.2
pandas==1.2.1
parso==0.8.1
pexpect==4.8.0
pickleshare==0.7.5
Pillow==8.1.0
pip==20.3.3
pkginfo==1.5.0.1
prompt-toolkit==3.0.8
ptyprocess==0.7.0
Pygments==2.7.4
pynndescent==0.5.1
pyparsing==2.4.7
python-dateutil==2.8.1
pytz==2021.1
pyzmq==20.0.0
readme-renderer==25.0
requests-toolbelt==0.9.1
rfc3986==1.4.0
scikit-learn==0.24.1
scipy==1.6.0
setuptools==50.3.0
six==1.15.0
smart-open==4.1.2
threadpoolctl==2.1.0
top2vec==1.0.20
tornado==6.1
tqdm==4.43.0
traitlets==5.0.5
twine==3.2.0
umap-learn==0.5.0
wcwidth==0.2.5
webencodings==0.5.1
wheel==0.35.1
wordcloud==1.8.1

ddangelov added a commit that referenced this issue Feb 5, 2021
ddangelov added a commit that referenced this issue Feb 5, 2021
@ddangelov
Copy link
Owner

The new version of Top2Vec has numpy==1.19.2 and joblib<1.0.0 as requirements. These should resolve the issue.

@ddangelov
Copy link
Owner

ddangelov commented Feb 7, 2021

I've been having trouble with numpy when using Top2Vec version 1.0.20 with Python 3.8.0 on Ubuntu 18.04; I experience the same problems using Python 3.7.5. I've tried installing numpy 1.0.20, numpy 1.19.5.

see this issue for the hbsc error.

and this issue for the umap error.

UMAP

PicklingError:

(snip)

/data/.top2vec/lib/python3.8/site-packages/umap/umap_.py in fit(self, X, y)
   2571 
   2572         numba.set_num_threads(self._original_n_threads)
-> 2573         self._input_hash = joblib.hash(self._raw_data)
   2574 
   2575         return self

/data/.top2vec/lib/python3.8/site-packages/joblib/hashing.py in hash(obj, hash_name, coerce_mmap)
    259     else:
    260         hasher = Hasher(hash_name=hash_name)
--> 261     return hasher.hash(obj)

/data/.top2vec/lib/python3.8/site-packages/joblib/hashing.py in hash(self, obj, return_digest)
     61     def hash(self, obj, return_digest=True):
     62         try:
---> 63             self.dump(obj)
     64         except pickle.PicklingError as e:
     65             e.args += ('PicklingError while hashing %r: %r' % (obj, e),)

(snip)

PicklingError: ("Can't pickle <class 'numpy.dtype[float32]'>: it's not found as numpy.dtype[float32]", 'PicklingError while hashing array([[ 0.002187  , -0.00357572, -0.00279311, ...,  0.00120361,\n        -0.00115495,  0.00059189],\n       [-0.05823869,  0.01436491,  0.02220243, ...,  0.00703284,\n        -0.01716192, -0.01003473],\n       [-0.00334117,  0.00051066,  0.00269544, ...,  0.00070796,\n        -0.00202038, -0.00233051],\n       ...,\n       [ 0.00062888,  0.0027382 ,  0.0044361 , ..., -0.00229976,\n         0.00057765, -0.00033288],\n       [-0.00081269,  0.00099852, -0.00054314, ...,  0.00133646,\n        -0.00026089, -0.00150439],\n       [-0.01297437,  0.0104734 ,  0.01563089, ..., -0.00051685,\n        -0.00144138, -0.00556232]], dtype=float32): PicklingError("Can\'t pickle <class \'numpy.dtype[float32]\'>: it\'s not found as numpy.dtype[float32]")')

HDBSCAN

from top2vec import Top2Vec

(snip)

/data/.top2vec/lib/python3.8/site-packages/hdbscan/hdbscan_.py in <module>
     19 from scipy.sparse import csgraph
     20 
---> 21 from ._hdbscan_linkage import (single_linkage,
     22                                mst_linkage_core,
     23                                mst_linkage_core_vector,

hdbscan/_hdbscan_linkage.pyx in init hdbscan._hdbscan_linkage()

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

As mentioned above this seems to be an ongoing issue. There is lots of detail in the linked issues above for umap and hdbscan.

@AltfunsMA
Copy link
Author

AltfunsMA commented Feb 11, 2021

I finally had some time to try and have got this working by drawing on this comment as follows:

I used Dimo's freeze as requirements.txt but removed all mkl which were not available on pypi to yield the list at the bottom.

cd a/suitable/path/for/you
python -m venv t2v 
source t2v/bin/activate
pip install - r requirements.txt --no-cache-dir 
pip uninstall hdbscan
pip install hdbscan --no-cache-dir --no-binary :all: --no-build-isolation

Pretty sure it was the --no-build-isolation flag to rebuild hdbscan that did the trick, so you can probably just do pip install top2vec instead of using a requirements file as long as you uninstall and rebuild hdbscan locally after

I used Python 3.8.0 but again I'm pretty sure it doesn't really matter much.

I had intended to specify pip install hdbscan==0.8.26 version but forgot... and it seems to work fine with hdbscan-0.8.27 joblib-1.0.1. I haven't tried anything beyond the basic example using my own data, but I don't suppose it'll break.

----- requirements.txt
appnope==0.1.2
backcall==0.2.0
bleach==3.1.4
certifi==2020.12.5
colorama==0.4.3
cycler==0.10.0
Cython==0.29.21
decorator==4.4.2
docutils==0.16
gensim==3.8.3
hdbscan==0.8.26
ipykernel==5.3.4
ipython==7.20.0
ipython-genutils==0.2.0
jedi==0.17.0
joblib==0.17.0
jupyter-client==6.1.7
jupyter-core==4.7.1
keyring==21.2.0
kiwisolver==1.3.1
llvmlite==0.35.0
matplotlib==3.3.4
numba==0.52.0
numpy==1.19.2
pandas==1.2.1
parso==0.8.1
pexpect==4.8.0
pickleshare==0.7.5
Pillow==8.1.0
pip==20.3.3
pkginfo==1.5.0.1
prompt-toolkit==3.0.8
ptyprocess==0.7.0
Pygments==2.7.4
pynndescent==0.5.1
pyparsing==2.4.7
python-dateutil==2.8.1
pytz==2021.1
pyzmq==20.0.0
readme-renderer==25.0
requests-toolbelt==0.9.1
rfc3986==1.4.0
scikit-learn==0.24.1
scipy==1.6.0
setuptools==50.3.0
six==1.15.0
smart-open==4.1.2
threadpoolctl==2.1.0
top2vec==1.0.20
tornado==6.1
tqdm==4.43.0
traitlets==5.0.5
twine==3.2.0
umap-learn==0.5.0
wcwidth==0.2.5
webencodings==0.5.1
wheel==0.35.1
wordcloud==1.8.1

@ddangelov
Copy link
Owner

With version 1.0.23 of Top2Vec these issues should hopefully be resolved as it uses the updated versions of UMAP and HDBSCAN that have addressed these issues as well as numpy>=1.20.0.

@j1n6
Copy link

j1n6 commented May 21, 2021

it looks like the issue is related to the ABI change in version numpy version 1.20.0. This would require ABI recompilation.
numpy/numpy#16938

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants