Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hash(s::ByteString, seed::Uint32) #392

Merged
merged 2 commits into from
Feb 19, 2012

Conversation

avibryant
Copy link
Contributor

It's occasionally useful to have control over the seed being used by
the hash algorithm. This adds a hash() method with an extra parameter
to allow this. Currently, it's 32bit only - not sure what the best way
is to allow 64bit as well.

It's occasionally useful to have control over the seed being used by
the hash algorithm. This adds a hash() method with an extra parameter
to allow this. Currently, it's 32bit only - not sure what the best way
is to allow 64bit as well.
@JeffBezanson
Copy link
Member

Since there are hash functions for several types (some of which are recursive), and potentially for user-defined types, would it be better for this to be global like the random number generator seed?

@avibryant
Copy link
Contributor Author

A couple of use cases: bloom filters and minhash signatures. In both cases you want to hash the same value with N different hash functions (or the same hash function with different seeds). For this, you don't want it to be global; however it's fine if it doesn't apply for all types... strings or byte arrays are by far the most common here. Maybe we need a different function name?

@JeffBezanson
Copy link
Member

Ok, that makes sense. Do you see value in the 32-bit hash, which is presumably faster on 32-bit machines, or maybe we should just use the 64-bit hash function everywhere for simplicity? If we want to keep both, then memhash_seed should follow the pattern that's there for 64- and 32-bit platforms.

Exported symbols also need to be added to src/julia.expmap, since on linux we set symbols to be hidden by default.

While we're on the subject I'd like to get your opinion on another hashing issue. Currently we have isequal true for equal numbers of different types, so for example

julia> isequal(int8([1,2]), [1.0,2.0])
true

This seems to preclude efficient block-based hashing of Int8 or Uint8 arrays. Maybe we should make isequal stricter so byte arrays can be hashed faster?

@avibryant
Copy link
Contributor Author

I think we need to keep both. For the uses I have in mind, you're going to want to have explicit control over the memory use tradeoffs. So either two different functions or it should take a type parameter... I don't have a good enough sense yet of what would be idiomatic, let me know what you'd like to see.

For the second question: I think it's useful to have an equality function that automatically promotes, but I think you're right that the one used by hash tables should be stricter; maybe the looser one should be named isequivalent?

@JeffBezanson
Copy link
Member

Then let's continue to use memhash32 on 32-bit, and memhash on 64-bit. On 64-bit machines, I believe MurmurHash3_x64_128 is basically as fast as MurmurHash3_x86_32 (if not faster), so we should just call that and you can truncate the value to 32 bits if you want to save space.

Thanks for the feedback. We will probably be making some changes along those lines soon.

@avibryant
Copy link
Contributor Author

Ok, I'll make that change. What do you think about it being a method on hash that takes the seed param but only works for strings, vs. a separate function?

@JeffBezanson
Copy link
Member

It's ok for it to be a method of hash, unless there's some other use for calling hash with multiple arguments. I can't really think of one, except maybe having hash(x, y, z, ...) hash and mix multiple values, but that's reaching a bit.

The memhash_seed functions now choose 32 vs 64bit hashing the same way
the fixed-seed ones do. Also added them to julia.expmap.
JeffBezanson added a commit that referenced this pull request Feb 19, 2012
@JeffBezanson JeffBezanson merged commit 14e490b into JuliaLang:master Feb 19, 2012
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Oct 11, 2021
Avoid overwriting auto-generated *Weights constructors
dkarrasch pushed a commit that referenced this pull request Jul 22, 2023
Stdlib: SparseArrays
URL: https://github.com/JuliaSparse/SparseArrays.jl.git
Stdlib branch: main
Julia branch: master
Old commit: b4b0e72
New commit: 99c99b4
Julia version: 1.11.0-DEV
SparseArrays version: 1.10.0 (Does not match)
Bump invoked by: @dkarrasch
Powered by:
[BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl)

Diff:
JuliaSparse/SparseArrays.jl@b4b0e72...99c99b4

```
$ git log --oneline b4b0e72..99c99b4
99c99b4 Specialize 3-arg `dot` for sparse self-adjoint matrices (#398)
cb10c1e use unwrapping mechanism for triangular matrices (#396)
b3872c8 added warning for iterating while mutating a sparse matrix (#415)
f8f0f40 bring coverage of fixed SparseMatrixCSC to 100% (#392)
0eb9c04 fix typos (#414)
```

Co-authored-by: Dilum Aluthge <[email protected]>
KristofferC pushed a commit that referenced this pull request Jul 24, 2023
Stdlib: SparseArrays
URL: https://github.com/JuliaSparse/SparseArrays.jl.git
Stdlib branch: main
Julia branch: master
Old commit: b4b0e72
New commit: 99c99b4
Julia version: 1.11.0-DEV
SparseArrays version: 1.10.0 (Does not match)
Bump invoked by: @dkarrasch
Powered by:
[BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl)

Diff:
JuliaSparse/SparseArrays.jl@b4b0e72...99c99b4

```
$ git log --oneline b4b0e72..99c99b4
99c99b4 Specialize 3-arg `dot` for sparse self-adjoint matrices (#398)
cb10c1e use unwrapping mechanism for triangular matrices (#396)
b3872c8 added warning for iterating while mutating a sparse matrix (#415)
f8f0f40 bring coverage of fixed SparseMatrixCSC to 100% (#392)
0eb9c04 fix typos (#414)
```

Co-authored-by: Dilum Aluthge <[email protected]>
(cherry picked from commit 6691a75)
Keno pushed a commit that referenced this pull request Oct 9, 2023
Rework the linetable internal API and prevent more invalidations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants