Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Update the default hasher in libcudf #17531

Open
GregoryKimball opened this issue Dec 5, 2024 · 1 comment
Open

[FEA] Update the default hasher in libcudf #17531

GregoryKimball opened this issue Dec 5, 2024 · 1 comment
Labels
feature request New feature or request

Comments

@GregoryKimball
Copy link
Contributor

GregoryKimball commented Dec 5, 2024

Is your feature request related to a problem? Please describe.
Now that xxhash64 is available, #12829 we should evaluate the default hash function. The default hash function is MurmurHash3_x86_32 as of 25.02 .

This came up in #12829 (comment) where we might see performance benefits and register pressure reductions from using a different hash function.

Describe the solution you'd like
Compile results from hash join, mixed join and groupby aggregation benchmarks across the available hash functions.

Additional context
We will want to add support for XXHash_32 before running this investigation.

@bdice
Copy link
Contributor

bdice commented Dec 5, 2024

(Copying from my duplicate issue:)

I would like to know if switching our default hasher from MurmurHash3_32 to XXHash32 has an impact on performance. This was previously proposed in #12829 but deserves an issue of its own to be acted upon.

If the register usage is lower for XXHash32 than MurmurHash3_32, it might benefit some of cudf's most complex kernels. Mixed joins, for instance, combine hashing and AST execution. Similar ideas have been mentioned before in issues like #10587. I'm not confident that this would make a positive difference (I expect no significant effect, based on our past explorations of mixed joins), but it would be worth checking.

Originally posted by @bdice in #12829

@PointKernel PointKernel mentioned this issue Dec 5, 2024
3 tasks
rapids-bot bot pushed a commit that referenced this issue Jan 7, 2025
Contributes to #17531

This PR introduces the xxhash_32 hasher to libcudf as a preparatory step for evaluating the impact of replacing murmurhash3_x86_32 with xxhash_32 as the default hash.

Authors:
  - Yunsong Wang (https://github.com/PointKernel)
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - David Wendt (https://github.com/davidwendt)
  - Lawrence Mitchell (https://github.com/wence-)

URL: #17533
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
Status: Needs owner
Development

No branches or pull requests

2 participants