-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Merged by Bors] - Use sync.Pool for blake3 hashers #6044
Conversation
bors try |
bors cancel |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #6044 +/- ##
=======================================
Coverage 81.4% 81.4%
=======================================
Files 297 299 +2
Lines 31965 31999 +34
=======================================
+ Hits 26043 26073 +30
- Misses 4225 4229 +4
Partials 1697 1697 ☔ View full report in Codecov by Sentry. |
tryBuild failed: |
bors try |
tryBuild failed: |
bors try |
tryBuild succeeded: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could add Closes #5850 in the description.
activation/wire/wire_v1.go
Outdated
h.Reset() | ||
hash.PutHasher(h) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgetting to Reset()
the hasher would have tragic consequences. How about resetting it automatically in PutHasher()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @poszu for the review. So this came up obviously while sketching this change out and here is the reasoning on why I've done it this way:
- hashing is a fairly easy operation which is usually compact in terms of usage, so it is easy to see where the hasher is acquired and therefore make sure it is not held too long (so it goes back into the pool ASAP) and release it manually
- The reason why I kept it simply in the pool is to just make sure objects come in and out of the pool, without too much boilerplate around it
- The methods are documented as requiring the consumers to do a
Reset
, just as in some APIs it is documented that consumers are expected to hold a lock on some mutex or synchronization primitive - I did not put the cleanups in a
defer
statement because putting it in a closure means it adds another minimum of 2 operations for every defer call (one of evaluating the empty closure at thedefer
call, and another when the anonymous function that exists inside of it). Obviously this can be replaced by just callingdefer
twice with theReset
andPutHasher
separately. Since we're doing a lot of hashing I would try to reduce overhead on all of these corollary ops. - If you insist of having the cleanups in a
defer
statement (usually I would vote to do the same, apart from several exceptions), I would suggest to change thePutHasher
method to bePutResetHasher
instead, and that the call toReset
would then be inlined there, and then call it in a singledefer
call in all places where it could be used as adefer
. There are a few places in the PR where I wouldn't use a defer because it would mean that the instance unnecessarily stays with the consuming component and then I would just release it manually so that other callers could also make use it ASAP. Let me know if this is acceptable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The methods are documented as requiring the consumers to do a Reset, just as in some APIs it is documented that consumers are expected to hold a lock on some mutex or synchronization primitive
I know, but there is nothing enforcing this rule and if it can be forgotten, it will be forgotten. IMHO, if the hasher MUST be reset before it is returned to the pool, then it should be implemented in a way to make it impossible to make a mistake. The easiest way of achieving it is to reset it automatically in the Put
function.
I would suggest to change the PutHasher method to be PutResetHasher
I think that's not necessary. The user of this API doesn't need to know that the hasher is reset. The user only wants to get a hasher and return it.
I did not put the cleanups in a defer statement because putting it in a closure means it adds another minimum of 2 operations for every defer call (one of evaluating the empty closure at the defer call, and another when the anonymous function that exists inside of it).
I don't understand, do you mean that:
defer func() {
h.Reset()
hash.Put(h)
}()
is too much overhead because of the extra closure? I didn't check but I would be surprised if that's not inlined.
There are a few places in the PR where I wouldn't use a defer because it would mean that the instance unnecessarily stays with the consuming component and then I would just release it manually so that other callers could also make use it ASAP.
I understand, then it's good to release it ASAP manually as you say. The alternative could be to extract the hashing part into a function and continue relying on defer there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @poszu I'll amend as suggested.
And as you said the defer closure will indeed get inlined (I wrote this simple example to see for my own eyes): https://termbin.com/umrn
common/types/ballot.go
Outdated
func (b *Ballot) HashInnerBytes() []byte { | ||
h := hash.New() | ||
h := hash.GetHasher() | ||
_, err := codec.EncodeTo(h, &b.InnerBallot) | ||
if err != nil { | ||
log.With().Fatal("failed to encode InnerBallot for hashing", log.Err(err)) | ||
} | ||
return h.Sum(nil) | ||
sum := h.Sum(nil) | ||
h.Reset() | ||
hash.PutHasher(h) | ||
return sum | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be best to always put it back with defer
to avoid leaking when the code is refactored in the future to have an early exit (for example due to error handling). Perhaps consider an API similar to the one in the context
package? I.e. return the "free" function along with the hasher. Using it would look as follows:
h, free := hash.GetHasher()
defer free()
hash/pool.go
Outdated
// GetHasher will get a blake3 hasher from the pool. | ||
// It may or may not allocate a new one. Consumers are expected | ||
// to call Reset() on the hasher before putting it back in | ||
// the pool. | ||
func GetHasher() *blake3.Hasher { | ||
return pool.Get().(*blake3.Hasher) | ||
} | ||
|
||
// PutHasher returns the hasher back to the pool. | ||
// Consumers are expected to call Reset() on the | ||
// instance before putting it back in the pool. | ||
func PutHasher(hasher *blake3.Hasher) { | ||
pool.Put(hasher) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposed API would look like:
// GetHasher will get a blake3 hasher from the pool. | |
// It may or may not allocate a new one. Consumers are expected | |
// to call Reset() on the hasher before putting it back in | |
// the pool. | |
func GetHasher() *blake3.Hasher { | |
return pool.Get().(*blake3.Hasher) | |
} | |
// PutHasher returns the hasher back to the pool. | |
// Consumers are expected to call Reset() on the | |
// instance before putting it back in the pool. | |
func PutHasher(hasher *blake3.Hasher) { | |
pool.Put(hasher) | |
} | |
// GetHasher will get a blake3 hasher from the pool. | |
// It may or may not allocate a new one. Consumers must | |
// call the returned function when they don't need the hasher anymore | |
// to return it back to the pool. Forgetting to do so would leak the hasher. | |
func GetHasher() (*blake3.Hasher, func()) { | |
hasher := pool.Get().(*blake3.Hasher) | |
free := func() { | |
hasher.Reset() | |
pool.Put(hasher) | |
} | |
return hasher, free | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively we could also return a type that is a *blake3.Hasher
but also implements a Close()
method that returns it to the pool and then:
hh := hash.New()
defer hh.Close()
where New
fetches from the pool and Close
puts it back. But I'm OK with the other pattern as well.
fb48a6e
to
e06dff6
Compare
bors try |
tryBuild succeeded: |
bors merge |
## Motivation Improve memory allocations using a memory pool for hashers. <!-- Give a brief description of the motivation for this PR. 1-2 sentences is fine. -->
Pull request successfully merged into develop. Build succeeded: |
Motivation
Improve memory allocations using a memory pool for hashers.
Description
Our nodes use a lot of blake3 hashers and this is visible on heap profiles:
This PR aims to amortize the costs of allocations of these hashers over time by using a
sync.Pool
instead of just using these hashers as one-off disposable hashers (they are built to be reused using theReset()
method on thehash.Hasher
interface exactly for this purpose).Fixes #5850
Test Plan