Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Use sync.Pool for blake3 hashers #6044

Closed
wants to merge 4 commits into from
Closed

Conversation

acud
Copy link
Contributor

@acud acud commented Jun 13, 2024

Motivation

Improve memory allocations using a memory pool for hashers.

Description

Our nodes use a lot of blake3 hashers and this is visible on heap profiles:

(pprof) sample_index=alloc_space
(pprof) top
Showing nodes accounting for 51611.76GB, 79.82% of 64661.28GB total
Dropped 3308 nodes (cum <= 323.31GB)
Showing top 10 nodes out of 134
      flat  flat%   sum%        cum   cum%
28458.86GB 44.01% 44.01% 28458.86GB 44.01%  github.com/zeebo/blake3.New
 7623.76GB 11.79% 55.80%  7625.85GB 11.79%  github.com/libp2p/go-buffer-pool.(*BufferPool).Get
 4043.65GB  6.25% 62.06%  4043.65GB  6.25%  github.com/libp2p/go-libp2p-pubsub/pb.(*ControlIHave).Unmarshal
 3523.27GB  5.45% 67.50%  3523.27GB  5.45%  github.com/libp2p/go-libp2p-pubsub/pb.(*Message).Unmarshal
 2199.32GB  3.40% 70.91%  2199.32GB  3.40%  github.com/libp2p/go-libp2p-pubsub.(*GossipSubRouter).getPeers

This PR aims to amortize the costs of allocations of these hashers over time by using a sync.Pool instead of just using these hashers as one-off disposable hashers (they are built to be reused using the Reset() method on the hash.Hasher interface exactly for this purpose).

Fixes #5850

Test Plan

  • Manually test

@acud
Copy link
Contributor Author

acud commented Jun 13, 2024

bors try

spacemesh-bors bot added a commit that referenced this pull request Jun 13, 2024
@acud
Copy link
Contributor Author

acud commented Jun 13, 2024

bors cancel

Copy link

codecov bot commented Jun 14, 2024

Codecov Report

Attention: Patch coverage is 95.23810% with 3 lines in your changes missing coverage. Please review.

Project coverage is 81.4%. Comparing base (8228d0c) to head (e906675).

Files Patch % Lines
eligibility/fixedoracle.go 50.0% 2 Missing ⚠️
genvm/vm.go 83.3% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           develop   #6044   +/-   ##
=======================================
  Coverage     81.4%   81.4%           
=======================================
  Files          297     299    +2     
  Lines        31965   31999   +34     
=======================================
+ Hits         26043   26073   +30     
- Misses        4225    4229    +4     
  Partials      1697    1697           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@spacemesh-bors
Copy link

try

Build failed:

@acud
Copy link
Contributor Author

acud commented Jun 14, 2024

bors try

spacemesh-bors bot added a commit that referenced this pull request Jun 14, 2024
@spacemesh-bors
Copy link

try

Build failed:

@acud
Copy link
Contributor Author

acud commented Jun 14, 2024

bors try

spacemesh-bors bot added a commit that referenced this pull request Jun 14, 2024
@spacemesh-bors
Copy link

try

Build succeeded:

Copy link
Contributor

@poszu poszu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add Closes #5850 in the description.

Comment on lines 138 to 139
h.Reset()
hash.PutHasher(h)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgetting to Reset() the hasher would have tragic consequences. How about resetting it automatically in PutHasher()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @poszu for the review. So this came up obviously while sketching this change out and here is the reasoning on why I've done it this way:

  • hashing is a fairly easy operation which is usually compact in terms of usage, so it is easy to see where the hasher is acquired and therefore make sure it is not held too long (so it goes back into the pool ASAP) and release it manually
  • The reason why I kept it simply in the pool is to just make sure objects come in and out of the pool, without too much boilerplate around it
  • The methods are documented as requiring the consumers to do a Reset, just as in some APIs it is documented that consumers are expected to hold a lock on some mutex or synchronization primitive
  • I did not put the cleanups in a defer statement because putting it in a closure means it adds another minimum of 2 operations for every defer call (one of evaluating the empty closure at the defer call, and another when the anonymous function that exists inside of it). Obviously this can be replaced by just calling defer twice with the Reset and PutHasher separately. Since we're doing a lot of hashing I would try to reduce overhead on all of these corollary ops.
  • If you insist of having the cleanups in a defer statement (usually I would vote to do the same, apart from several exceptions), I would suggest to change the PutHasher method to be PutResetHasher instead, and that the call to Reset would then be inlined there, and then call it in a single defer call in all places where it could be used as a defer. There are a few places in the PR where I wouldn't use a defer because it would mean that the instance unnecessarily stays with the consuming component and then I would just release it manually so that other callers could also make use it ASAP. Let me know if this is acceptable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The methods are documented as requiring the consumers to do a Reset, just as in some APIs it is documented that consumers are expected to hold a lock on some mutex or synchronization primitive

I know, but there is nothing enforcing this rule and if it can be forgotten, it will be forgotten. IMHO, if the hasher MUST be reset before it is returned to the pool, then it should be implemented in a way to make it impossible to make a mistake. The easiest way of achieving it is to reset it automatically in the Put function.

I would suggest to change the PutHasher method to be PutResetHasher

I think that's not necessary. The user of this API doesn't need to know that the hasher is reset. The user only wants to get a hasher and return it.

I did not put the cleanups in a defer statement because putting it in a closure means it adds another minimum of 2 operations for every defer call (one of evaluating the empty closure at the defer call, and another when the anonymous function that exists inside of it).

I don't understand, do you mean that:

defer func() {
    h.Reset()
    hash.Put(h)
}()

is too much overhead because of the extra closure? I didn't check but I would be surprised if that's not inlined.

There are a few places in the PR where I wouldn't use a defer because it would mean that the instance unnecessarily stays with the consuming component and then I would just release it manually so that other callers could also make use it ASAP.

I understand, then it's good to release it ASAP manually as you say. The alternative could be to extract the hashing part into a function and continue relying on defer there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @poszu I'll amend as suggested.

And as you said the defer closure will indeed get inlined (I wrote this simple example to see for my own eyes): https://termbin.com/umrn

Comment on lines 275 to 283
func (b *Ballot) HashInnerBytes() []byte {
h := hash.New()
h := hash.GetHasher()
_, err := codec.EncodeTo(h, &b.InnerBallot)
if err != nil {
log.With().Fatal("failed to encode InnerBallot for hashing", log.Err(err))
}
return h.Sum(nil)
sum := h.Sum(nil)
h.Reset()
hash.PutHasher(h)
return sum
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be best to always put it back with defer to avoid leaking when the code is refactored in the future to have an early exit (for example due to error handling). Perhaps consider an API similar to the one in the context package? I.e. return the "free" function along with the hasher. Using it would look as follows:

h, free := hash.GetHasher()
defer free()

hash/pool.go Outdated
Comment on lines 17 to 29
// GetHasher will get a blake3 hasher from the pool.
// It may or may not allocate a new one. Consumers are expected
// to call Reset() on the hasher before putting it back in
// the pool.
func GetHasher() *blake3.Hasher {
return pool.Get().(*blake3.Hasher)
}

// PutHasher returns the hasher back to the pool.
// Consumers are expected to call Reset() on the
// instance before putting it back in the pool.
func PutHasher(hasher *blake3.Hasher) {
pool.Put(hasher)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed API would look like:

Suggested change
// GetHasher will get a blake3 hasher from the pool.
// It may or may not allocate a new one. Consumers are expected
// to call Reset() on the hasher before putting it back in
// the pool.
func GetHasher() *blake3.Hasher {
return pool.Get().(*blake3.Hasher)
}
// PutHasher returns the hasher back to the pool.
// Consumers are expected to call Reset() on the
// instance before putting it back in the pool.
func PutHasher(hasher *blake3.Hasher) {
pool.Put(hasher)
}
// GetHasher will get a blake3 hasher from the pool.
// It may or may not allocate a new one. Consumers must
// call the returned function when they don't need the hasher anymore
// to return it back to the pool. Forgetting to do so would leak the hasher.
func GetHasher() (*blake3.Hasher, func()) {
hasher := pool.Get().(*blake3.Hasher)
free := func() {
hasher.Reset()
pool.Put(hasher)
}
return hasher, free
}

Copy link
Member

@fasmat fasmat Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively we could also return a type that is a *blake3.Hasher but also implements a Close() method that returns it to the pool and then:

hh := hash.New()
defer hh.Close()

where New fetches from the pool and Close puts it back. But I'm OK with the other pattern as well.

@acud acud self-assigned this Jun 14, 2024
@acud acud force-pushed the blake3cache branch 2 times, most recently from fb48a6e to e06dff6 Compare June 17, 2024 18:21
@acud
Copy link
Contributor Author

acud commented Jun 17, 2024

bors try

spacemesh-bors bot added a commit that referenced this pull request Jun 17, 2024
@spacemesh-bors
Copy link

try

Build succeeded:

@acud acud requested a review from poszu June 18, 2024 13:50
@acud acud marked this pull request as ready for review June 18, 2024 13:50
@acud acud requested review from dshulyak, fasmat and ivan4th as code owners June 18, 2024 13:50
@acud
Copy link
Contributor Author

acud commented Jun 18, 2024

bors merge

spacemesh-bors bot pushed a commit that referenced this pull request Jun 18, 2024
## Motivation

Improve memory allocations using a memory pool for hashers.
<!-- Give a brief description of the motivation for this PR. 1-2 sentences is fine. -->
@spacemesh-bors
Copy link

Pull request successfully merged into develop.

Build succeeded:

@spacemesh-bors spacemesh-bors bot changed the title Use sync.Pool for blake3 hashers [Merged by Bors] - Use sync.Pool for blake3 hashers Jun 18, 2024
@spacemesh-bors spacemesh-bors bot closed this Jun 18, 2024
@spacemesh-bors spacemesh-bors bot deleted the blake3cache branch June 18, 2024 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Node should re-use instances of hash.Hash by using a sync.Pool
3 participants