Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement HOST_UDF aggregation for reduction and segmented reduction #17645

Open
wants to merge 26 commits into
base: branch-25.02
Choose a base branch
from

Conversation

ttnghia
Copy link
Contributor

@ttnghia ttnghia commented Dec 20, 2024

Following #17592, this enables HOST_UDF aggregation in reduction and segmented reduction, allowing to execute a host-side user-defined function (UDF) through libcudf aggregation framework.

Closes #16633.

@ttnghia ttnghia added feature request New feature or request 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. Java Affects Java cuDF API. Spark Functionality that helps Spark RAPIDS non-breaking Non-breaking change labels Dec 20, 2024
@ttnghia ttnghia self-assigned this Dec 20, 2024
@ttnghia ttnghia requested review from a team as code owners December 20, 2024 19:25
Signed-off-by: Nghia Truong <[email protected]>
@ttnghia
Copy link
Contributor Author

ttnghia commented Dec 29, 2024

The logic for host_udf_base interface is changed in this PR. Instead of one universal interface for aggregation contexts (groupby/reduction/segmented reduction) as designed in the previous work, now there are three separate interfaces for them. Implementation for each context needs to derive from the corresponding interface.

Because the interfaces are no longer designed to be universal for all contexts, they are much simpler, avoiding potential bugs due to universal usage of one interface for different contexts.

@ttnghia ttnghia requested a review from res-life January 1, 2025 06:14
@ttnghia ttnghia removed the request for review from davidwendt January 6, 2025 16:12
Copy link
Contributor

@nvdbaranec nvdbaranec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick first pass.

cpp/include/cudf/aggregation/host_udf.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/aggregation/host_udf.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/aggregation/host_udf.hpp Outdated Show resolved Hide resolved
ttnghia and others added 4 commits January 7, 2025 11:18
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
@@ -1,6 +1,6 @@
/*
*
* Copyright (c) 2020-2023, NVIDIA CORPORATION.
* Copyright (c) 2020-2025, NVIDIA CORPORATION.
Copy link
Contributor Author

@ttnghia ttnghia Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the copyright header for the files that were forgotten in the previous PR. Github CI seems does not check Java files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team feature request New feature or request Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Spark Functionality that helps Spark RAPIDS
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

[FEA] Allow to run groupby/reduction with externally derived aggregations
3 participants