-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add public API and tests for hierarchical balanced k-means #1113
Add public API and tests for hierarchical balanced k-means #1113
Conversation
… improve doxygens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The public API looks good! Have you tried to compile and run the clusters on double
? Also, I did a small workaround to the build_fine_clusters very recently - please, have a look and merge the changes #1161
That's covered in the new unit test. |
…commit-hook clang-format...
…rite_only_unary_op` (#1149) Follows [a discussion](#1113 (comment)) that we had about `write_only_unary_op` / `writeOnlyUnaryOp`. For consistency and to simplify the use of this primitive, we shouldn't require dereferencing a pointer in the functor. This new implementation uses `thrust::tabulate` but we can add our own optimized kernel later if we need. Authors: - Louis Sugy (https://github.com/Nyrio) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) - Artem M. Chirkin (https://github.com/achirkin) URL: #1149
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Nyrio, can you please update this PR to accept raft::device_resources instead of raft::handle_t everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Louis for the updates, the PR looks good to me!
IdxT n_rows, | ||
LabelT* labels, | ||
rmm::mr::device_memory_resource* mr) | ||
inline std::enable_if_t<std::is_floating_point_v<MathT>> predict_core( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope that we can soon enable int8 types as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int8
dataset with floating-point centroids, or all int8
? The former means repeating the conversion or cast operation many times in the distance kernels, and for the latter, we have to evaluate if there could be a loss of quality in the clustering, or if learning could somehow get stuck?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'm not sure) Aren't the distance kernels always memory-bound? If so, I don't think the conversion (which is usually a simple multiply + cast) would noticeably hurt performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just couple minor things below.
raft::device_matrix_view<const DataT, IndexT> X, | ||
raft::device_matrix_view<MathT, IndexT> centroids, | ||
MappingOpT mapping_op = raft::identity_op()) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you require the mapping to be identity when DataT == MathT
, consider adding a static assert:
static_assert(!std::is_same_v<MathT, DataT> || !std::is_same_v<MappingOpT, raft::identity_op>,
"Only the identity mapping is supported when MathT == DataT")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this would break the ANN methods that pass mapping<float>
which behaves like the identity when the input type is float. We want an identity op, not the identity_op
.
/merge |
No description provided.