-
Notifications
You must be signed in to change notification settings - Fork 548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MNT] Small NumPy 2 related fixes #5954
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved pre-commit-config changes
Thanks for the PR @seberg, I think someone like @divyegala can have a look at the remaining failure, but it'll probably need to wait until Monday. |
Thanks to @divyegala for realizing the remaining error is related to Fixing that fixes the remaining failure (single gpu tests run locally). If |
@seberg as far as I can tell from a quick search through the codebase, the reference UMAP package is only used for pyhon tests. Can we get away with patching umap and building it from source? This will let us push out numpy 2.0.0 compatible cuML with the rest of RAPIDS, although I do not know how quickly we want this and neither do I know when numpy 2.0.1 is releasing. |
The main thing right now is that CuPy still needs a release, I think. So I suspect it might be easier to sit out NumPy 2.0.1. (But yeah, I think monkey patching will become plausible once CuPy is there.) |
python/cuml/internals/array.py
Outdated
@@ -1172,12 +1172,16 @@ def from_input( | |||
if ( | |||
not fail_on_order and order != arr.order and order != "K" | |||
) or make_copy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are checking this within the if
, we could drop this check
) or make_copy: | |
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect you are right and this can be simplified. But deleting it would not make a copy at all.
We could maybe delete the whole if
(always creating a new arr
) or assuming a copy is always made (I am not certain that is currently true for e.g. 1-D arrays).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not always do a copy and not always want to. If you see above in the function, the make_copy
variable is inferred from other conditions like
make_copy = force_contiguous and not arr.is_contiguous
which would make the complex conditional worse if we wanted to pack everything in. So we do not want to delete this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok maybe there's a better way to write the code below then: #5954 (comment)
Edit: Included a suggestion for clarity: #5954 (review)
python/cuml/internals/array.py
Outdated
if make_copy: | ||
data = arr.mem_type.xpy.array( | ||
arr.to_output("array"), order=order | ||
) | ||
else: | ||
data = arr.mem_type.xpy.asarray( | ||
arr.to_output("array"), order=order | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems odd to check this above and here. Is there a better way to write this so we only check make_copy
once? Perhaps with an elif
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, I was working on adding polars support (which needs to happen here), so I'm reviewing the whole conditionals... would you mind opening an issue to cleanup this so it doesn't block/slow down this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarity tried to include some suggestions below. Though have no strong feelings on whether they are included
ref: #5954 (review)
This applys some smaller NumPy 2 related fixes. With (in progress) cupy 13.2 fixups, the single gpu test suite seems to be doing fine (not quite finished, I may push more commits, but can also open a new PR). The one thinig I noticed that is a bit anonying is that hdbscan is not yet released for NumPy 2, is that actually still required since I think sklearn has a version? (I don't expect this to be a problem for long, but there is at least one odd test failure trying to make hdbscan work in scikit-learn-contrib/hdbscan#644)
Even if NumPy reverts, this is not a problem.
I am not actually sure what changed here, but deepcopy seems sensible?
Rebased, since there were conflicts. I think the remaining failures were unrelated, but maybe/hopefully the rebase resolves them anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Including a proposal of how the array copying logic could be updated. Meant mostly to be illustrative. Defer to others on whether it is used
python/cuml/internals/array.py
Outdated
if make_copy: | ||
data = arr.mem_type.xpy.array( | ||
arr.to_output("array"), order=order | ||
) | ||
else: | ||
data = arr.mem_type.xpy.asarray( | ||
arr.to_output("array"), order=order | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarity tried to include some suggestions below. Though have no strong feelings on whether they are included
ref: #5954 (review)
if ( | ||
not fail_on_order and order != arr.order and order != "K" | ||
) or make_copy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ( | |
not fail_on_order and order != arr.order and order != "K" | |
) or make_copy: | |
if not fail_on_order and order != arr.order and order != "K": |
if make_copy: | ||
data = arr.mem_type.xpy.array( | ||
arr.to_output("array"), order=order | ||
) | ||
else: | ||
data = arr.mem_type.xpy.asarray( | ||
arr.to_output("array"), order=order | ||
) | ||
|
||
arr = cls(data, index=index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if make_copy: | |
data = arr.mem_type.xpy.array( | |
arr.to_output("array"), order=order | |
) | |
else: | |
data = arr.mem_type.xpy.asarray( | |
arr.to_output("array"), order=order | |
) | |
arr = cls(data, index=index) | |
arr = cls( | |
arr.mem_type.xpy.asarray(arr.to_output("array"), order=order), | |
index=index, | |
) | |
elif make_copy: | |
arr = cls( | |
arr.mem_type.xpy.array(arr.to_output("array"), order=order), index=index | |
) |
Merging due to closeness to code-freeze, @jakirkham capturing a task to improve and simplify the data processing code in #5995 |
/merge |
This applies some smaller NumPy 2 related fixes. With (in progress) cupy 13.2 fixups, the single gpu test suite seems to be doing mostly fine. There is a single test remaining:
is failing with:
being completely different from the reference:
And I am not sure why that might be, I will prod it a bit more, but it may need someone who knows the methods to have a look.
One wrinkle is that hdbscan is not yet released for NumPy 2, but I guess that still required even though sklearn has a version?
(Probably, not a big issue, but my fixups scikit-learn-contrib/hdbscan#644 run into some issue even though it doesn't seem NumPy 2 related.)
xref: rapidsai/build-planning#38