Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: numpy.ndarray size changed when calling import hdbscan #39

Closed
jo-mueller opened this issue Jan 26, 2022 · 2 comments · Fixed by #41
Closed

ValueError: numpy.ndarray size changed when calling import hdbscan #39

jo-mueller opened this issue Jan 26, 2022 · 2 comments · Fixed by #41
Labels
bug Something isn't working

Comments

@jo-mueller
Copy link
Collaborator

jo-mueller commented Jan 26, 2022

I recently installed the clusters plotter and ran into an issue when running hdbscan. When I run the clustering on a set of measurements (e.g., based on blobs.gif), I receive this error:

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

Here's a screenshot of the setup

Unbenannt

and the complete traceback:

File e:\biapol\projects\napari-clusters-plotter\napari_clusters_plotter\_clustering.py:274, in ClusteringWidget.run(self=<napari_clusters_plotter._clustering.ClusteringWidget object>, labels_layer=<Labels layer 'labels'>, selected_measurements_list=['min_intensity', 'max_intensity', 'sum_intensity', 'area', 'mean_intensity', 'centroid_x', 'centroid_y', 'centroid_z', 'mean_distance_to_centroid', 'standard_deviation_intensity', 'max_distance_to_centroid', 'mean_max_distance_to_centroid_ratio'], selected_method='HDBSCAN', num_clusters=2, num_iterations=300, standardize=False, min_cluster_size=5, min_nr_samples=5)
    271     add_column_to_layer_tabular_data(labels_layer, "KMEANS_CLUSTER_ID_SCALER_" + str(standardize), y_pred)
    273 elif selected_method == "HDBSCAN":
--> 274     y_pred = hdbscan_clustering(standardize, selected_properties, min_cluster_size, min_nr_samples)
        selected_properties =     min_intensity  max_intensity  ...  max_distance_to_centroid  mean_max_distance_to_centroid_ratio
0           152.0          232.0  ...                 19.075121                             2.262548
1           152.0          224.0  ...                  9.869590                             1.911683
2           152.0          248.0  ...                 16.878197                             1.795291
3           152.0          248.0  ...                 12.767069                             1.674100
4           152.0          248.0  ...                 15.421144                             1.845683
..            ...            ...  ...                       ...                                  ...
57          152.0          224.0  ...                  8.849086                             1.716192
58          152.0          216.0  ...                 10.764353                             2.296404
59          152.0          248.0  ...                  8.658107                             2.178612
60          152.0          248.0  ...                  6.827476                             2.224879
61          152.0          224.0  ...                  7.477417                             2.171838

[62 rows x 12 columns]
        standardize = False
        min_cluster_size = 5
        min_nr_samples = 5
    275     print("HDBSCAN predictions finished.")
    276     # write result back to features/properties of the labels layer

File e:\biapol\projects\napari-clusters-plotter\napari_clusters_plotter\_clustering.py:305, in hdbscan_clustering(standardize=False, measurements=    min_intensity  max_intensity  ...  max_dista...                 2.171838

[62 rows x 12 columns], min_cluster_size=5, min_samples=5)
    304 def hdbscan_clustering(standardize, measurements, min_cluster_size, min_samples):
--> 305     import hdbscan
    306     print("HDBSCAN predictions started (standardize: " + str(standardize) + ")...")
    308     clustering_hdbscan = hdbscan.HDBSCAN(min_cluster_size=min_cluster_size, min_samples=min_samples)

File ~\anaconda3\envs\napari_clusters\lib\site-packages\hdbscan\__init__.py:1, in <module>
----> 1 from .hdbscan_ import HDBSCAN, hdbscan
      2 from .robust_single_linkage_ import RobustSingleLinkage, robust_single_linkage
      3 from .validity import validity_index

File ~\anaconda3\envs\napari_clusters\lib\site-packages\hdbscan\hdbscan_.py:21, in <module>
     17 from joblib.parallel import cpu_count
     19 from scipy.sparse import csgraph
---> 21 from ._hdbscan_linkage import (single_linkage,
     22                                mst_linkage_core,
     23                                mst_linkage_core_vector,
     24                                label)
     25 from ._hdbscan_tree import (condense_tree,
     26                             compute_stability,
     27                             get_clusters,
     28                             outlier_scores)
     29 from ._hdbscan_reachability import (mutual_reachability,
     30                                     sparse_mutual_reachability)

File hdbscan/_hdbscan_linkage.pyx:1, in init hdbscan._hdbscan_linkage()

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

I looked up the error and it probably boils down to this issue. I tried a few of the suggestions from this thread (e.g., rolling numpy back to 1.20.5), but couldn't solve the issue. I currently have numpy 1.21.5 installed.

Edit: It seems that will be solved once the new release of hdbscan is out (apparently it works when installing directly from hdbscan master branch), but I haven't verified this.

@jo-mueller jo-mueller added the bug Something isn't working label Jan 26, 2022
@lazigu
Copy link
Collaborator

lazigu commented Jan 27, 2022

Hi, Johannes @jo-mueller, thanks for reporting! So I just tried to reproduce the error on two laptops but I wasn't successful, the plugin works fine on both machines. But with one laptop, which is quite old, it's a different story since I am creating there an environment with a lower python version (3.8), and lower pyopencl version (2020.1). While with my laptop I always need to install hdbscan via conda prior to installing the plugin, because it always fails to build wheels for hdbscan. I see installing hdbscan via conda was also mentioned as a solution in the issue you linked, which might be why I am not seeing this error. Have you tried installing hdbscan that way?
These are the exact steps I do in case it might be helpful:
conda create --name ncp-env python=3.9
conda install -c conda-forge pyopencl
python -m pip install "napari[all]"
conda install -c conda-forge hdbscan
pip install napari-clusters-plotter

I also have numpy 1.21.5 installed in the environment, hdbscan=0.8.27, numba=0.55.0

@jo-mueller
Copy link
Collaborator Author

Hi Laura @lazigu ,

thanks for the quick fix, this solves it :) I'll make a small PR to add this to the troubleshooting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants