-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hash(s::ByteString, seed::Uint32) #392
hash(s::ByteString, seed::Uint32) #392
Conversation
It's occasionally useful to have control over the seed being used by the hash algorithm. This adds a hash() method with an extra parameter to allow this. Currently, it's 32bit only - not sure what the best way is to allow 64bit as well.
Since there are hash functions for several types (some of which are recursive), and potentially for user-defined types, would it be better for this to be global like the random number generator seed? |
A couple of use cases: bloom filters and minhash signatures. In both cases you want to hash the same value with N different hash functions (or the same hash function with different seeds). For this, you don't want it to be global; however it's fine if it doesn't apply for all types... strings or byte arrays are by far the most common here. Maybe we need a different function name? |
Ok, that makes sense. Do you see value in the 32-bit hash, which is presumably faster on 32-bit machines, or maybe we should just use the 64-bit hash function everywhere for simplicity? If we want to keep both, then memhash_seed should follow the pattern that's there for 64- and 32-bit platforms. Exported symbols also need to be added to While we're on the subject I'd like to get your opinion on another hashing issue. Currently we have
This seems to preclude efficient block-based hashing of Int8 or Uint8 arrays. Maybe we should make |
I think we need to keep both. For the uses I have in mind, you're going to want to have explicit control over the memory use tradeoffs. So either two different functions or it should take a type parameter... I don't have a good enough sense yet of what would be idiomatic, let me know what you'd like to see. For the second question: I think it's useful to have an equality function that automatically promotes, but I think you're right that the one used by hash tables should be stricter; maybe the looser one should be named isequivalent? |
Then let's continue to use memhash32 on 32-bit, and memhash on 64-bit. On 64-bit machines, I believe Thanks for the feedback. We will probably be making some changes along those lines soon. |
Ok, I'll make that change. What do you think about it being a method on hash that takes the seed param but only works for strings, vs. a separate function? |
It's ok for it to be a method of |
The memhash_seed functions now choose 32 vs 64bit hashing the same way the fixed-seed ones do. Also added them to julia.expmap.
hash(s::ByteString, seed::Uint32)
Avoid overwriting auto-generated *Weights constructors
Stdlib: SparseArrays URL: https://github.com/JuliaSparse/SparseArrays.jl.git Stdlib branch: main Julia branch: master Old commit: b4b0e72 New commit: 99c99b4 Julia version: 1.11.0-DEV SparseArrays version: 1.10.0 (Does not match) Bump invoked by: @dkarrasch Powered by: [BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl) Diff: JuliaSparse/SparseArrays.jl@b4b0e72...99c99b4 ``` $ git log --oneline b4b0e72..99c99b4 99c99b4 Specialize 3-arg `dot` for sparse self-adjoint matrices (#398) cb10c1e use unwrapping mechanism for triangular matrices (#396) b3872c8 added warning for iterating while mutating a sparse matrix (#415) f8f0f40 bring coverage of fixed SparseMatrixCSC to 100% (#392) 0eb9c04 fix typos (#414) ``` Co-authored-by: Dilum Aluthge <[email protected]>
Stdlib: SparseArrays URL: https://github.com/JuliaSparse/SparseArrays.jl.git Stdlib branch: main Julia branch: master Old commit: b4b0e72 New commit: 99c99b4 Julia version: 1.11.0-DEV SparseArrays version: 1.10.0 (Does not match) Bump invoked by: @dkarrasch Powered by: [BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl) Diff: JuliaSparse/SparseArrays.jl@b4b0e72...99c99b4 ``` $ git log --oneline b4b0e72..99c99b4 99c99b4 Specialize 3-arg `dot` for sparse self-adjoint matrices (#398) cb10c1e use unwrapping mechanism for triangular matrices (#396) b3872c8 added warning for iterating while mutating a sparse matrix (#415) f8f0f40 bring coverage of fixed SparseMatrixCSC to 100% (#392) 0eb9c04 fix typos (#414) ``` Co-authored-by: Dilum Aluthge <[email protected]> (cherry picked from commit 6691a75)
Rework the linetable internal API and prevent more invalidations
It's occasionally useful to have control over the seed being used by
the hash algorithm. This adds a hash() method with an extra parameter
to allow this. Currently, it's 32bit only - not sure what the best way
is to allow 64bit as well.