-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request : Need for different hashing algorithm functions in queries #3105
Comments
Any reason why you can't precompute the hashes? I have written a Polars plugin for that a while ago |
Hi, thanks for the quick reply. I can precompute the hashes, it's no problem, but since I am performing a Merge operation to a delta table and I need to hash the rows in the destination table, it would be more efficient to compute them in the update predicate of the merge, rather than reading the target table twice (one for computing the hashes, another for merging). I was just curious if there was a way to create an UDF somehow or to expand the hashing algorithms list available. |
Registering udfs is likely possible. You can take a look at Datafusion-python and then see if you can replicate the registering functionality into our code base, I'm open for a PR for this |
From what I saw, your implementation in Python does not directly use datafusion, datafusion in your codebase is used on the rust level, before the port to Python. Is there any plan or chance so the udf functionality gets implemented in delta-rs and ported to python ? https://docs.rs/datafusion/latest/datafusion/logical_expr/struct.ScalarUDF.html Thank you again for the reply! |
It seems like datafusion project started development for this feature, it'd be nice to include it in delta-rs after they release it. |
We don't have to explicitly include it. If they do a new major release than we have to bump the dependency |
Request for hashing with xxhash64 in a merge operation (SQL context)
Hi,
I am using the deltalake port to Python with Polars and deltalake lib, currently in need of hashing some columns on a Merge operation with xxhash64 algorithm (via the update parameter) but the SQL context accepts only certain hashing functions, based on the rust code I traced down a list of accepted functions after realising it's based in datafusion expression API (derived from the rust library imports).
Is there any way or plans to expand this list with other hashing algorithms? Or to register an UDF in the python API that I can use in the SQL context of a Merge?
I saw the following rust crate is also imported in the project : https://crates.io/crates/twox-hash. And this contains the implementation for the hash algorithm in my request.
Thanks for handling my request.
Use Case
Hashing with xxhash64 or xxh3 type algorithms : https://github.com/Cyan4973/xxHash
The text was updated successfully, but these errors were encountered: