Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add public API and tests for hierarchical balanced k-means #1113

Merged
merged 53 commits into from
Jan 31, 2023

Conversation

Nyrio
Copy link
Contributor

@Nyrio Nyrio commented Dec 20, 2022

No description provided.

@Nyrio Nyrio requested review from a team as code owners December 20, 2022 16:30
@Nyrio Nyrio added 3 - Ready for Review feature request New feature or request non-breaking Non-breaking change labels Dec 20, 2022
@Nyrio Nyrio assigned tfeher and achirkin and unassigned Nyrio Jan 4, 2023
@cjnolet cjnolet assigned Nyrio and unassigned tfeher and achirkin Jan 4, 2023
@cjnolet
Copy link
Member

cjnolet commented Jan 4, 2023

@Nyrio i use the "assignee" in the PRs to mark the author of the PR so it's more easy to search the PRs. If you'd like to request @tfeher and @achirkin for review, please use the "reviewers" section of the PR.

@cjnolet cjnolet requested a review from tfeher January 4, 2023 16:32
@Nyrio Nyrio added the 4 - Waiting on Reviewer Waiting for reviewer to review or respond label Jan 24, 2023
Copy link
Contributor

@achirkin achirkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The public API looks good! Have you tried to compile and run the clusters on double? Also, I did a small workaround to the build_fine_clusters very recently - please, have a look and merge the changes #1161

cpp/include/raft/cluster/kmeans_balanced.cuh Show resolved Hide resolved
cpp/include/raft/cluster/kmeans_balanced.cuh Show resolved Hide resolved
cpp/include/raft/cluster/detail/kmeans_balanced.cuh Outdated Show resolved Hide resolved
cpp/include/raft/cluster/detail/kmeans_balanced.cuh Outdated Show resolved Hide resolved
@Nyrio
Copy link
Contributor Author

Nyrio commented Jan 24, 2023

Have you tried to compile and run the clusters on double?

That's covered in the new unit test.

@Nyrio Nyrio requested a review from a team as a code owner January 25, 2023 16:31
rapids-bot bot pushed a commit that referenced this pull request Jan 25, 2023
…rite_only_unary_op` (#1149)

Follows [a discussion](#1113 (comment)) that we had about `write_only_unary_op` / `writeOnlyUnaryOp`.

For consistency and to simplify the use of this primitive, we shouldn't require dereferencing a pointer in the functor.

This new implementation uses `thrust::tabulate` but we can add our own optimized kernel later if we need.

Authors:
  - Louis Sugy (https://github.com/Nyrio)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)
  - Corey J. Nolet (https://github.com/cjnolet)
  - Artem M. Chirkin (https://github.com/achirkin)

URL: #1149
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nyrio, can you please update this PR to accept raft::device_resources instead of raft::handle_t everywhere?

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Louis for the updates, the PR looks good to me!

IdxT n_rows,
LabelT* labels,
rmm::mr::device_memory_resource* mr)
inline std::enable_if_t<std::is_floating_point_v<MathT>> predict_core(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope that we can soon enable int8 types as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int8 dataset with floating-point centroids, or all int8? The former means repeating the conversion or cast operation many times in the distance kernels, and for the latter, we have to evaluate if there could be a loss of quality in the clustering, or if learning could somehow get stuck?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm not sure) Aren't the distance kernels always memory-bound? If so, I don't think the conversion (which is usually a simple multiply + cast) would noticeably hurt performance.

@Nyrio Nyrio requested a review from achirkin January 26, 2023 16:24
Copy link
Contributor

@achirkin achirkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just couple minor things below.

cpp/test/cluster/kmeans_balanced.cu Outdated Show resolved Hide resolved
raft::device_matrix_view<const DataT, IndexT> X,
raft::device_matrix_view<MathT, IndexT> centroids,
MappingOpT mapping_op = raft::identity_op())
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you require the mapping to be identity when DataT == MathT, consider adding a static assert:

static_assert(!std::is_same_v<MathT, DataT> || !std::is_same_v<MappingOpT, raft::identity_op>,
  "Only the identity mapping is supported when MathT == DataT")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this would break the ANN methods that pass mapping<float> which behaves like the identity when the input type is float. We want an identity op, not the identity_op.

@cjnolet
Copy link
Member

cjnolet commented Jan 31, 2023

/merge

@rapids-bot rapids-bot bot merged commit f4ceb58 into rapidsai:branch-23.02 Jan 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Waiting on Reviewer Waiting for reviewer to review or respond CMake cpp feature request New feature or request non-breaking Non-breaking change python
Projects
Development

Successfully merging this pull request may close these issues.

5 participants