Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MNT] Small NumPy 2 related fixes #5954

Merged
merged 10 commits into from
Jul 28, 2024
Merged

Conversation

seberg
Copy link
Contributor

@seberg seberg commented Jul 3, 2024

This applies some smaller NumPy 2 related fixes. With (in progress) cupy 13.2 fixups, the single gpu test suite seems to be doing mostly fine. There is a single test remaining:

test_simpl_set.py::test_simplicial_set_embedding

is failing with:

(Pdb) cp.asarray(cu_embedding)
array([[23067.518, 23067.518],
       [17334.559, 17334.559],
       [22713.598, 22713.598],
       ...,
       [23238.438, 23238.438],
       [25416.912, 25416.912],
       [19748.943, 19748.943]], dtype=float32)

being completely different from the reference:

array([[5.330462 , 4.3419437],
       [4.1822557, 5.6225405],
       [5.200859 , 4.530094 ],
       ...,
       [4.852359 , 5.0026293],
       [5.361374 , 4.1475334],
       [4.0259256, 5.7187223]], dtype=float32)

And I am not sure why that might be, I will prod it a bit more, but it may need someone who knows the methods to have a look.

One wrinkle is that hdbscan is not yet released for NumPy 2, but I guess that still required even though sklearn has a version?
(Probably, not a big issue, but my fixups scikit-learn-contrib/hdbscan#644 run into some issue even though it doesn't seem NumPy 2 related.)

xref: rapidsai/build-planning#38

@seberg seberg requested a review from a team as a code owner July 3, 2024 15:30
@github-actions github-actions bot added the Cython / Python Cython or Python issue label Jul 3, 2024
@seberg seberg requested a review from a team as a code owner July 3, 2024 16:50
@seberg seberg requested a review from KyleFromNVIDIA July 3, 2024 16:50
Copy link
Contributor

@KyleFromNVIDIA KyleFromNVIDIA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved pre-commit-config changes

@dantegd
Copy link
Member

dantegd commented Jul 3, 2024

Thanks for the PR @seberg, I think someone like @divyegala can have a look at the remaining failure, but it'll probably need to wait until Monday.

@tfeher tfeher added bug Something isn't working non-breaking Non-breaking change labels Jul 4, 2024
@seberg
Copy link
Contributor Author

seberg commented Jul 11, 2024

Thanks to @divyegala for realizing the remaining error is related to umap. umap uses np.unique with return_inverse which changed (although I will probably revert that for 2.0.1).

Fixing that fixes the remaining failure (single gpu tests run locally). If umap doesn't do a release we might need to wait for NumPy 2.0.1 for 100% test passing (also hdbscan is fixed but not released yet).

@divyegala
Copy link
Member

@seberg as far as I can tell from a quick search through the codebase, the reference UMAP package is only used for pyhon tests. Can we get away with patching umap and building it from source? This will let us push out numpy 2.0.0 compatible cuML with the rest of RAPIDS, although I do not know how quickly we want this and neither do I know when numpy 2.0.1 is releasing.

cc @dantegd @cjnolet if that's acceptable

@seberg
Copy link
Contributor Author

seberg commented Jul 11, 2024

The main thing right now is that CuPy still needs a release, I think. So I suspect it might be easier to sit out NumPy 2.0.1. (But yeah, I think monkey patching will become plausible once CuPy is there.)

@@ -1172,12 +1172,16 @@ def from_input(
if (
not fail_on_order and order != arr.order and order != "K"
) or make_copy:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are checking this within the if, we could drop this check

Suggested change
) or make_copy:
):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect you are right and this can be simplified. But deleting it would not make a copy at all.
We could maybe delete the whole if (always creating a new arr) or assuming a copy is always made (I am not certain that is currently true for e.g. 1-D arrays).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not always do a copy and not always want to. If you see above in the function, the make_copy variable is inferred from other conditions like

make_copy = force_contiguous and not arr.is_contiguous

which would make the complex conditional worse if we wanted to pack everything in. So we do not want to delete this.

Copy link
Member

@jakirkham jakirkham Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok maybe there's a better way to write the code below then: #5954 (comment)

Edit: Included a suggestion for clarity: #5954 (review)

Comment on lines 1175 to 1182
if make_copy:
data = arr.mem_type.xpy.array(
arr.to_output("array"), order=order
)
else:
data = arr.mem_type.xpy.asarray(
arr.to_output("array"), order=order
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems odd to check this above and here. Is there a better way to write this so we only check make_copy once? Perhaps with an elif?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, I was working on adding polars support (which needs to happen here), so I'm reviewing the whole conditionals... would you mind opening an issue to cleanup this so it doesn't block/slow down this PR?

Copy link
Member

@jakirkham jakirkham Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity tried to include some suggestions below. Though have no strong feelings on whether they are included

ref: #5954 (review)

seberg and others added 8 commits July 18, 2024 00:11
This applys some smaller NumPy 2 related fixes.  With (in progress)
cupy 13.2 fixups, the single gpu test suite seems to be doing fine
(not quite finished, I may push more commits, but can also open a new PR).

The one thinig I noticed that is a bit anonying is that hdbscan is not
yet released for NumPy 2, is that actually still required since I think
sklearn has a version?
(I don't expect this to be a problem for long, but there is at least one odd test
failure trying to make hdbscan work in scikit-learn-contrib/hdbscan#644)
Even if NumPy reverts, this is not a problem.
I am not actually sure what changed here, but deepcopy seems sensible?
@seberg
Copy link
Contributor Author

seberg commented Jul 18, 2024

Rebased, since there were conflicts. I think the remaining failures were unrelated, but maybe/hopefully the rebase resolves them anyway.

.pre-commit-config.yaml Outdated Show resolved Hide resolved
Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including a proposal of how the array copying logic could be updated. Meant mostly to be illustrative. Defer to others on whether it is used

Comment on lines 1175 to 1182
if make_copy:
data = arr.mem_type.xpy.array(
arr.to_output("array"), order=order
)
else:
data = arr.mem_type.xpy.asarray(
arr.to_output("array"), order=order
)
Copy link
Member

@jakirkham jakirkham Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity tried to include some suggestions below. Though have no strong feelings on whether they are included

ref: #5954 (review)

Comment on lines 1172 to 1174
if (
not fail_on_order and order != arr.order and order != "K"
) or make_copy:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (
not fail_on_order and order != arr.order and order != "K"
) or make_copy:
if not fail_on_order and order != arr.order and order != "K":

Comment on lines +1175 to +1184
if make_copy:
data = arr.mem_type.xpy.array(
arr.to_output("array"), order=order
)
else:
data = arr.mem_type.xpy.asarray(
arr.to_output("array"), order=order
)

arr = cls(data, index=index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if make_copy:
data = arr.mem_type.xpy.array(
arr.to_output("array"), order=order
)
else:
data = arr.mem_type.xpy.asarray(
arr.to_output("array"), order=order
)
arr = cls(data, index=index)
arr = cls(
arr.mem_type.xpy.asarray(arr.to_output("array"), order=order),
index=index,
)
elif make_copy:
arr = cls(
arr.mem_type.xpy.array(arr.to_output("array"), order=order), index=index
)

@dantegd
Copy link
Member

dantegd commented Jul 28, 2024

Merging due to closeness to code-freeze, @jakirkham capturing a task to improve and simplify the data processing code in #5995

@dantegd
Copy link
Member

dantegd commented Jul 28, 2024

/merge

@rapids-bot rapids-bot bot merged commit 4338268 into rapidsai:branch-24.08 Jul 28, 2024
60 of 62 checks passed
@seberg seberg deleted the numpy2 branch July 29, 2024 06:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Cython / Python Cython or Python issue non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants