-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT: Support NumPy 2 and build Python 3.12 wheels #644
Conversation
There seems to be a further test error:
That test error seems unrelated to NumPy 2? (I downgraded NumPy again) but I am not sure what it is related to, since downgrading scipy/sklearn didn't seem to do much. Some round-about issue with Cython? |
Ah, the:
is specific to certain versions of Cython I think.
in
since I suspect that was always a requirement. |
This applys some smaller NumPy 2 related fixes. With (in progress) cupy 13.2 fixups, the single gpu test suite seems to be doing fine (not quite finished, I may push more commits, but can also open a new PR). The one thinig I noticed that is a bit anonying is that hdbscan is not yet released for NumPy 2, is that actually still required since I think sklearn has a version? (I don't expect this to be a problem for long, but there is at least one odd test failure trying to make hdbscan work in scikit-learn-contrib/hdbscan#644)
This applys some smaller NumPy 2 related fixes. With (in progress) cupy 13.2 fixups, the single gpu test suite seems to be doing fine (not quite finished, I may push more commits, but can also open a new PR). The one thinig I noticed that is a bit anonying is that hdbscan is not yet released for NumPy 2, is that actually still required since I think sklearn has a version? (I don't expect this to be a problem for long, but there is at least one odd test failure trying to make hdbscan work in scikit-learn-contrib/hdbscan#644)
else: | ||
node_list = sorted(stability.keys(), reverse=True)[:-1] | ||
node_list = [int(n) for n in node_list] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like it, but it works. Cython and NumPy have a weird interaction here with allowing conversion from float or not depending on both Cython and NumPy versions.
If you want to convert from float, you could also just cdef
it to long
or int
, IIRC that should always work, and I don't think you care about mroe than 2**31 clusters....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree that there are some weird interactions here. It is down to the fact that at some point the values are contained in a single numpy array of type float64 (since other values in the array are floats) and this was "fine", but it is causing catches now in the cython/numpy interactions. I think this workaround looks fine under the circumstances.
edge_selection = np.prod(np.in1d(min_span_tree.T[:2], vertices).reshape( | ||
(min_span_tree.shape[0], 2), order='F'), axis=1).astype(bool) | ||
edge_selection = np.prod( | ||
np.isin(min_span_tree.T[:2], vertices), axis=0).astype(bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems correct, but may warrent a look. Anything else doesn't make sense to me anyway, the order="F"
was breaking things when switching to isin
, which doesn't flatten the result (keeps shape of vertices
).
The other places want to work with 1-D arrays as far as I can tell, so this distinction doesn't matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should be correct. isin
is a much nicer way to do this, but didn't exist back when this was first written.
Thanks for taking a loom at this. There are definitely some difficulties. There are good chunks of the codebase that are pretty old at this point and fail to use modern idioms that would make things easier. If you can change the testing grid to drop python 3.7 and 3.8, and add 3.12 (so all the tests come out green) that would be appreciated. |
Done, I guess it'll probably pass. The fact that floats are passed around as cluster IDs probably slip some things. It might be that the integer conversion doesn't matter anymore so long you build with NumPy 2, but then I am not sure that it builds right with 1.x and cython 2, the integer conversion problem may be cropping up in other places as well. Not sure I can help with auditing that if that is the case. The hacky way it could be solved is to use steal soething like this from the NumPy definitions (or just decide that
(the good way is probably a proper audit to actually use integers) |
Thanks for this it is honestly greatly appreciated! |
Thanks, would be cool if you can do a release soon (and remember that if there is an issue with integers and Cython, that is plausible for Cython 3 + old NumPy 1.x). |
This applys some smaller NumPy 2 related fixes. With (in progress) cupy 13.2 fixups, the single gpu test suite seems to be doing fine (not quite finished, I may push more commits, but can also open a new PR). The one thinig I noticed that is a bit anonying is that hdbscan is not yet released for NumPy 2, is that actually still required since I think sklearn has a version? (I don't expect this to be a problem for long, but there is at least one odd test failure trying to make hdbscan work in scikit-learn-contrib/hdbscan#644)
This applies some smaller NumPy 2 related fixes. With (in progress) cupy 13.2 fixups, the single gpu test suite seems to be doing mostly fine. There is a single test remaining: ``` test_simpl_set.py::test_simplicial_set_embedding ``` is failing with: ``` (Pdb) cp.asarray(cu_embedding) array([[23067.518, 23067.518], [17334.559, 17334.559], [22713.598, 22713.598], ..., [23238.438, 23238.438], [25416.912, 25416.912], [19748.943, 19748.943]], dtype=float32) ``` being completely different from the reference: ``` array([[5.330462 , 4.3419437], [4.1822557, 5.6225405], [5.200859 , 4.530094 ], ..., [4.852359 , 5.0026293], [5.361374 , 4.1475334], [4.0259256, 5.7187223]], dtype=float32) ``` And I am not sure why that might be, I will prod it a bit more, but it may need someone who knows the methods to have a look. One wrinkle is that hdbscan is not yet released for NumPy 2, but I guess that still required even though sklearn has a version? (Probably, not a big issue, but my fixups scikit-learn-contrib/hdbscan#644 run into some issue even though it doesn't seem NumPy 2 related.) xref: rapidsai/build-planning#38 Authors: - Sebastian Berg (https://github.com/seberg) - https://github.com/jakirkham - Dante Gama Dessavre (https://github.com/dantegd) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) - Dante Gama Dessavre (https://github.com/dantegd) URL: #5954
This adds NumPy support, and for good sport also 3.12 wheels. I didn't bother with supporting 3.8 (which NumPy 2 doesn't support) for wheels, since scikit-learn doesn't either anymore and you can use old wheels.
Tests pass locally, and will run with NumPy 2 now anyway, I presume (not that there are many).