Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #161: Implement == and hash for IndexLens and ComposedLens #162

Merged
merged 4 commits into from
Sep 16, 2021

Conversation

phipsgabler
Copy link
Collaborator

@phipsgabler phipsgabler commented Sep 13, 2021

Now I'm not an expert in these matters, but is seems the bottleneck is not hashing the tuples, but the types or symbols (since hash(::Symbol) falls back to objectid, too). In the commit without adding type symbols to hash (2ca5404):

julia> @benchmark hash($((1, )))
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min  max):  0.024 ns  0.074 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     0.028 ns             ┊ GC (median):    0.00%
 Time  (mean ± σ):   0.029 ns ± 0.003 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                                                          
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁ ▁
  0.074 ns       Histogram: frequency by time      0.026 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark hash($(@lens(_[1])))
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min  max):  0.024 ns  12.903 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     0.028 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   0.030 ns ±  0.129 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                                                           
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁ ▁
  0.063 ns       Histogram: frequency by time       0.027 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

With the symbols (880daf4):

julia> @benchmark hash($((1, )))
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min  max):  0.024 ns  0.062 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     0.028 ns             ┊ GC (median):    0.00%
 Time  (mean ± σ):   0.029 ns ± 0.003 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                                                          
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁ ▁
  0.062 ns       Histogram: frequency by time      0.027 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark hash($(@lens(_[1])))
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range (min  max):   9.690 ns  44.821 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     10.423 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   10.712 ns ±  1.721 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                                                            
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁▁▁  ▁
  15.1 ns         Histogram: frequency by time        10.1 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Without the symbol, an index lens hashes to the same as the underlying tuple, though:

julia> hash(@lens(_[1]))
0xf45b969ce741991e

julia> hash((1,))
0xf45b969ce741991e

That could be solved by using a magic constant instead, but is that a good idea?

CC @torfjelde

Copy link
Owner

@jw3126 jw3126 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @phipsgabler for looking into this.

That could be solved by using a magic constant instead, but is that a good idea?

Do you have any specific drawbacks in mind with the magic constant approach? If not I would prefer to have one. (Also for ComposedLens). Possible definition for the constant:

const SALT_IndexLens = objectid(IndexLens)

test/test_core.jl Show resolved Hide resolved
@phipsgabler
Copy link
Collaborator Author

Do you have any specific drawbacks in mind with the magic constant approach? If not I would prefer to have one. (Also for ComposedLens). Possible definition for the constant:

const SALT_IndexLens = objectid(IndexLens)

None other than me having no idea whether this is accepted good practice. The only objection I could remotely think of is: is it guaranteed that objectid(IndexLens) (or even hash(:IndexLens)) is constant in a distributed setting? Is that ever (considered) a problem?

If it's OK with you, I'll go ahead with it.

@jw3126
Copy link
Owner

jw3126 commented Sep 14, 2021

None other than me having no idea whether this is accepted good practice. The only objection I could remotely think of is: is it guaranteed that objectid(IndexLens) (or even hash(:IndexLens)) is constant in a distributed setting? Is that ever (considered) a problem?

I don't know the answer to these either. To be safer, let's just hardcode the following values:

julia> using Setfield

julia> objectid(Setfield.IndexLens)
0x8aa710fd8cb95e7e

julia> objectid(Setfield.ComposedLens)
0x1c9765b995720e93

@jw3126
Copy link
Owner

jw3126 commented Sep 14, 2021

I also asked about this on discourse.

@JeffreySarnoff
Copy link

To be safer, let's just hardcode the following values

If by hardcode you mean const value = objectid(object), objectid(object) is highly likely to change with different uses of the package and also with different runs of some use of the package. In a distributed setting, too. It would be trouble to have hash values for the same item in distinct contexts within a single run differ if they were to be checked for equality:

isequal is the comparison function used by hash tables (Dict). isequal(x,y)
must imply that hash(x) == hash(y). ...
Furthermore, isequal is linked with isless, and they work together to define a
fixed total ordering, where exactly one of isequal(x, y), isless(x, y), or
isless(y, x) must be true (and the other two false).

@jw3126
Copy link
Owner

jw3126 commented Sep 14, 2021

Thanks a lot @JeffreySarnoff for the clarification. By hardcoding I meant

const value = 0x8aa710fd8cb95e7e # objectid(Setfield.IndexLens)

objectid(object) is highly likely to change with different uses of the package and also with different runs of some use of the package.

But the default hash implementation uses objectid right? So the default Base.hash would suffer from the same instabilities?

@JeffreySarnoff
Copy link

not for all things -- Strings, Tuples of Ints and other bitstypes for example keep their objectids

@jw3126
Copy link
Owner

jw3126 commented Sep 14, 2021

struct S{T}
    value::T
end

function hash_cheat(obj, h::UInt)
    hash(obj.value, h)
end

const SALT = 0x8aa710fd8cb95e7e

function hash1(obj, h::UInt)
    hash(obj.value, hash(SALT, h))
end

function hash2(obj, h::UInt)
    hash(obj.value, h+SALT)
end

obj = S((1,))
seed = UInt(1)
using BenchmarkTools

@btime hash_cheat($(Ref(obj))[], $(Ref(seed))[])
@btime hash1($(Ref(obj))[], $(Ref(seed))[])
@btime hash2($(Ref(obj))[], $(Ref(seed))[])

#  1.833 ns (0 allocations: 0 bytes)
#  1.813 ns (0 allocations: 0 bytes)
#  1.833 ns (0 allocations: 0 bytes)

I think we should incorporate some type specific salt into our hash and based on the above benchmark,
it seems there is no performance difference between using + or hash for the mixing.
So I suggest we use the following implementation:

function Base.hash(o::IndexLens, h::UInt)
    salt = 0x8aa710fd8cb95e7e #objectid(Setfield.IndexLens)
    hash(o.indices, hash(salt, h))
end

@JeffreySarnoff
Copy link

JeffreySarnoff commented Sep 14, 2021

it seems there is no performance difference between using + or hash for the mixing.

There is under heavy use (Base knows best); simple benchmarking will not reveal that.

You should define your salt constant outside of and just above the hash function (better practice) and you must provide both a 64bit and a 32bit constant as hash uses UInt which may be either one.

@jw3126
Copy link
Owner

jw3126 commented Sep 14, 2021

it seems there is no performance difference between using + or hash for the mixing.

There is under heavy use (Base knows best); simple benchmarking will not reveal that.

Base operates under many constraints like generality and backward compatibility, so "Base knows best" is not obvious to me in this case. But anyway lets do it the Base way:

function make_salt(s64::UInt64)::UInt
    if UInt === UInt64
        return s64
    else
        return UInt32(s64 >> 32) ^ UInt32(s64 & 0x00000000ffffffff)
    end
end

const SALT_IndexLens = make_salt(0x8aa710fd8cb95e7e) # objectid(Setfield.IndexLens)
Base.hash(o::IndexLens, h::UInt) = hash(o.indices, SALT_IndexLens + h)

@JeffreySarnoff
Copy link

JeffreySarnoff commented Sep 14, 2021

Cool. Point taken. Base knows best about hashing Julia constructs (I should have been more explicit). A great deal of effort has gone into the hashing approach and its most efficient implementation.

@jw3126
Copy link
Owner

jw3126 commented Sep 14, 2021

Thanks again @JeffreySarnoff for all your patience and explanations, I appreciate them.

@phipsgabler
Copy link
Collaborator Author

Great! I'll update to the latest version in the next couple of days, plus a few more tests.

Copy link
Owner

@jw3126 jw3126 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Can you increase the minor version number @phipsgabler, since this is technically breaking? I will then tag a release immediately.

@jw3126 jw3126 merged commit d8938be into jw3126:master Sep 16, 2021
@phipsgabler phipsgabler deleted the phg/hashing branch September 16, 2021 07:00
@jw3126
Copy link
Owner

jw3126 commented Sep 16, 2021

@phipsgabler thanks again! This is now available in Setfield 0.8.

jw3126 added a commit to JuliaObjects/Accessors.jl that referenced this pull request Oct 1, 2021
jw3126 added a commit to JuliaObjects/Accessors.jl that referenced this pull request Oct 3, 2021
@tkf
Copy link
Collaborator

tkf commented Oct 3, 2021

@jw3126 Sorry if you've already thought through this, but, I think this is actually technically non-breaking. The only documented public interface we need to satisfy is isequal(x,y) implies hash(x)==hash(y). So, unless we've already documented that hash(lens) satisfies some additional property, we can freely re-define hash even for patch releases (that'd be annoying, but technically OK) provided it satisfies the above invariance. It's even fine to increment the salt for each release or just use hash(::Lens) = UInt(0) (of course, I'm not suggesting this 🙂).

(I just thought to ramble this, since I noticed JuliaObjects/Accessors.jl#31)

@jw3126
Copy link
Owner

jw3126 commented Oct 4, 2021

Thanks @tkf So for this PR we already tagged a breaking release. But in the case of Accessors your advise is to only bump the patch version?

@tkf
Copy link
Collaborator

tkf commented Oct 4, 2021

Yeah, I think Setfield.jl is good as-is. But I don't think Accessors.jl needs a breaking release.

aplavin pushed a commit to aplavin/Accessors.jl that referenced this pull request Jun 8, 2022
aplavin pushed a commit to aplavin/Accessors.jl that referenced this pull request Jun 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants