[Merged by Bors] - Use sync.Pool for blake3 hashers #6044

acud · 2024-06-13T23:42:48Z

Motivation

Improve memory allocations using a memory pool for hashers.

Description

Our nodes use a lot of blake3 hashers and this is visible on heap profiles:

(pprof) sample_index=alloc_space
(pprof) top
Showing nodes accounting for 51611.76GB, 79.82% of 64661.28GB total
Dropped 3308 nodes (cum <= 323.31GB)
Showing top 10 nodes out of 134
      flat  flat%   sum%        cum   cum%
28458.86GB 44.01% 44.01% 28458.86GB 44.01%  github.com/zeebo/blake3.New
 7623.76GB 11.79% 55.80%  7625.85GB 11.79%  github.com/libp2p/go-buffer-pool.(*BufferPool).Get
 4043.65GB  6.25% 62.06%  4043.65GB  6.25%  github.com/libp2p/go-libp2p-pubsub/pb.(*ControlIHave).Unmarshal
 3523.27GB  5.45% 67.50%  3523.27GB  5.45%  github.com/libp2p/go-libp2p-pubsub/pb.(*Message).Unmarshal
 2199.32GB  3.40% 70.91%  2199.32GB  3.40%  github.com/libp2p/go-libp2p-pubsub.(*GossipSubRouter).getPeers

This PR aims to amortize the costs of allocations of these hashers over time by using a sync.Pool instead of just using these hashers as one-off disposable hashers (they are built to be reused using the Reset() method on the hash.Hasher interface exactly for this purpose).

Fixes #5850

Test Plan

Manually test

acud · 2024-06-13T23:48:42Z

bors try

acud · 2024-06-13T23:52:14Z

bors cancel

codecov · 2024-06-14T00:08:13Z

Codecov Report

Attention: Patch coverage is 95.23810% with 3 lines in your changes missing coverage. Please review.

Project coverage is 81.4%. Comparing base (8228d0c) to head (e906675).

Files	Patch %	Lines
eligibility/fixedoracle.go	50.0%	2 Missing ⚠️
genvm/vm.go	83.3%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##           develop   #6044   +/-   ##
=======================================
  Coverage     81.4%   81.4%           
=======================================
  Files          297     299    +2     
  Lines        31965   31999   +34     
=======================================
+ Hits         26043   26073   +30     
- Misses        4225    4229    +4     
  Partials      1697    1697

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

spacemesh-bors · 2024-06-14T00:10:48Z

try

Build failed:

ci-status

acud · 2024-06-14T00:57:07Z

bors try

spacemesh-bors · 2024-06-14T01:42:13Z

try

Build failed:

systest-status

acud · 2024-06-14T01:53:37Z

bors try

spacemesh-bors · 2024-06-14T02:38:01Z

try

Build succeeded:

poszu

You could add Closes #5850 in the description.

poszu · 2024-06-14T08:17:57Z

activation/wire/wire_v1.go

+	h.Reset()
+	hash.PutHasher(h)


Forgetting to Reset() the hasher would have tragic consequences. How about resetting it automatically in PutHasher()?

Thanks @poszu for the review. So this came up obviously while sketching this change out and here is the reasoning on why I've done it this way:

hashing is a fairly easy operation which is usually compact in terms of usage, so it is easy to see where the hasher is acquired and therefore make sure it is not held too long (so it goes back into the pool ASAP) and release it manually

The reason why I kept it simply in the pool is to just make sure objects come in and out of the pool, without too much boilerplate around it

The methods are documented as requiring the consumers to do a Reset, just as in some APIs it is documented that consumers are expected to hold a lock on some mutex or synchronization primitive

I did not put the cleanups in a defer statement because putting it in a closure means it adds another minimum of 2 operations for every defer call (one of evaluating the empty closure at the defer call, and another when the anonymous function that exists inside of it). Obviously this can be replaced by just calling defer twice with the Reset and PutHasher separately. Since we're doing a lot of hashing I would try to reduce overhead on all of these corollary ops.

If you insist of having the cleanups in a defer statement (usually I would vote to do the same, apart from several exceptions), I would suggest to change the PutHasher method to be PutResetHasher instead, and that the call to Reset would then be inlined there, and then call it in a single defer call in all places where it could be used as a defer. There are a few places in the PR where I wouldn't use a defer because it would mean that the instance unnecessarily stays with the consuming component and then I would just release it manually so that other callers could also make use it ASAP. Let me know if this is acceptable

The methods are documented as requiring the consumers to do a Reset, just as in some APIs it is documented that consumers are expected to hold a lock on some mutex or synchronization primitive

I know, but there is nothing enforcing this rule and if it can be forgotten, it will be forgotten. IMHO, if the hasher MUST be reset before it is returned to the pool, then it should be implemented in a way to make it impossible to make a mistake. The easiest way of achieving it is to reset it automatically in the Put function.

I would suggest to change the PutHasher method to be PutResetHasher

I think that's not necessary. The user of this API doesn't need to know that the hasher is reset. The user only wants to get a hasher and return it.

I did not put the cleanups in a defer statement because putting it in a closure means it adds another minimum of 2 operations for every defer call (one of evaluating the empty closure at the defer call, and another when the anonymous function that exists inside of it).

I don't understand, do you mean that:

defer func() { h.Reset() hash.Put(h) }()

is too much overhead because of the extra closure? I didn't check but I would be surprised if that's not inlined.

There are a few places in the PR where I wouldn't use a defer because it would mean that the instance unnecessarily stays with the consuming component and then I would just release it manually so that other callers could also make use it ASAP.

I understand, then it's good to release it ASAP manually as you say. The alternative could be to extract the hashing part into a function and continue relying on defer there.

Thanks @poszu I'll amend as suggested.

And as you said the defer closure will indeed get inlined (I wrote this simple example to see for my own eyes): https://termbin.com/umrn

poszu · 2024-06-14T08:22:10Z

common/types/ballot.go

 func (b *Ballot) HashInnerBytes() []byte {
-	h := hash.New()
+	h := hash.GetHasher()
 	_, err := codec.EncodeTo(h, &b.InnerBallot)
 	if err != nil {
 		log.With().Fatal("failed to encode InnerBallot for hashing", log.Err(err))
 	}
-	return h.Sum(nil)
+	sum := h.Sum(nil)
+	h.Reset()
+	hash.PutHasher(h)
+	return sum
 }


I think it would be best to always put it back with defer to avoid leaking when the code is refactored in the future to have an early exit (for example due to error handling). Perhaps consider an API similar to the one in the context package? I.e. return the "free" function along with the hasher. Using it would look as follows:

h, free := hash.GetHasher() defer free()

poszu · 2024-06-14T08:26:11Z

hash/pool.go

+// GetHasher will get a blake3 hasher from the pool.
+// It may or may not allocate a new one. Consumers are expected
+// to call Reset() on the hasher before putting it back in
+// the pool.
+func GetHasher() *blake3.Hasher {
+	return pool.Get().(*blake3.Hasher)
+}
+
+// PutHasher returns the hasher back to the pool.
+// Consumers are expected to call Reset() on the
+// instance before putting it back in the pool.
+func PutHasher(hasher *blake3.Hasher) {
+	pool.Put(hasher)
+}


The proposed API would look like:

Suggested change

// GetHasher will get a blake3 hasher from the pool.

// It may or may not allocate a new one. Consumers are expected

// to call Reset() on the hasher before putting it back in

// the pool.

func GetHasher() *blake3.Hasher {

return pool.Get().(*blake3.Hasher)

}

// PutHasher returns the hasher back to the pool.

// Consumers are expected to call Reset() on the

// instance before putting it back in the pool.

func PutHasher(hasher *blake3.Hasher) {

pool.Put(hasher)

}

// GetHasher will get a blake3 hasher from the pool.

// It may or may not allocate a new one. Consumers must

// call the returned function when they don't need the hasher anymore

// to return it back to the pool. Forgetting to do so would leak the hasher.

func GetHasher() (*blake3.Hasher, func()) {

hasher := pool.Get().(*blake3.Hasher)

free := func() {

hasher.Reset()

pool.Put(hasher)

}

return hasher, free

}

Alternatively we could also return a type that is a *blake3.Hasher but also implements a Close() method that returns it to the pool and then:

hh := hash.New() defer hh.Close()

where New fetches from the pool and Close puts it back. But I'm OK with the other pattern as well.

acud · 2024-06-17T21:34:36Z

bors try

spacemesh-bors · 2024-06-17T22:26:22Z

try

Build succeeded:

acud · 2024-06-18T14:42:53Z

bors merge

## Motivation Improve memory allocations using a memory pool for hashers.

spacemesh-bors · 2024-06-18T15:27:41Z

Pull request successfully merged into develop.

Build succeeded:

spacemesh-bors bot added a commit that referenced this pull request Jun 13, 2024

Try #6044:

5e7ba79

spacemesh-bors bot added a commit that referenced this pull request Jun 14, 2024

Try #6044:

4b727f7

spacemesh-bors bot added a commit that referenced this pull request Jun 14, 2024

Try #6044:

492f132

poszu reviewed Jun 14, 2024

View reviewed changes

acud self-assigned this Jun 14, 2024

acud added 2 commits June 17, 2024 12:20

perf: use sync.Pool for hashers

dc13868

chore: address PR review comments

c74972d

acud force-pushed the blake3cache branch 2 times, most recently from fb48a6e to e06dff6 Compare June 17, 2024 18:21

refactor: opinion hasher to be pooled

8e5fe93

acud force-pushed the blake3cache branch from e06dff6 to 8e5fe93 Compare June 17, 2024 18:26

spacemesh-bors bot added a commit that referenced this pull request Jun 17, 2024

Try #6044:

9374252

acud requested a review from poszu June 18, 2024 13:50

acud marked this pull request as ready for review June 18, 2024 13:50

acud requested review from dshulyak, fasmat and ivan4th as code owners June 18, 2024 13:50

acud added the resource/memory label Jun 18, 2024

poszu approved these changes Jun 18, 2024

View reviewed changes

chore: add changelog entry

e906675

spacemesh-bors bot pushed a commit that referenced this pull request Jun 18, 2024

Use sync.Pool for blake3 hashers (#6044)

ed5e6f9

## Motivation Improve memory allocations using a memory pool for hashers.

spacemesh-bors bot changed the title ~~Use sync.Pool for blake3 hashers~~ [Merged by Bors] - Use sync.Pool for blake3 hashers Jun 18, 2024

spacemesh-bors bot closed this Jun 18, 2024

spacemesh-bors bot deleted the blake3cache branch June 18, 2024 15:27

acud mentioned this pull request Jun 20, 2024

Node should re-use instances of hash.Hash by using a sync.Pool #5850

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Merged by Bors] - Use sync.Pool for blake3 hashers #6044

[Merged by Bors] - Use sync.Pool for blake3 hashers #6044

acud commented Jun 13, 2024 •

edited

Loading

acud commented Jun 13, 2024

acud commented Jun 13, 2024

codecov bot commented Jun 14, 2024 •

edited

Loading

spacemesh-bors bot commented Jun 14, 2024

acud commented Jun 14, 2024

spacemesh-bors bot commented Jun 14, 2024

acud commented Jun 14, 2024

spacemesh-bors bot commented Jun 14, 2024

poszu left a comment

poszu Jun 14, 2024

acud Jun 14, 2024

poszu Jun 17, 2024

acud Jun 17, 2024

poszu Jun 14, 2024

poszu Jun 14, 2024

fasmat Jun 18, 2024 •

edited

Loading

acud commented Jun 17, 2024

spacemesh-bors bot commented Jun 17, 2024

acud commented Jun 18, 2024

spacemesh-bors bot commented Jun 18, 2024

[Merged by Bors] - Use sync.Pool for blake3 hashers #6044

[Merged by Bors] - Use sync.Pool for blake3 hashers #6044

Conversation

acud commented Jun 13, 2024 • edited Loading

Motivation

Description

Test Plan

acud commented Jun 13, 2024

acud commented Jun 13, 2024

codecov bot commented Jun 14, 2024 • edited Loading

Codecov Report

spacemesh-bors bot commented Jun 14, 2024

try

acud commented Jun 14, 2024

spacemesh-bors bot commented Jun 14, 2024

try

acud commented Jun 14, 2024

spacemesh-bors bot commented Jun 14, 2024

try

poszu left a comment

Choose a reason for hiding this comment

poszu Jun 14, 2024

Choose a reason for hiding this comment

acud Jun 14, 2024

Choose a reason for hiding this comment

poszu Jun 17, 2024

Choose a reason for hiding this comment

acud Jun 17, 2024

Choose a reason for hiding this comment

poszu Jun 14, 2024

Choose a reason for hiding this comment

poszu Jun 14, 2024

Choose a reason for hiding this comment

fasmat Jun 18, 2024 • edited Loading

Choose a reason for hiding this comment

acud commented Jun 17, 2024

spacemesh-bors bot commented Jun 17, 2024

try

acud commented Jun 18, 2024

spacemesh-bors bot commented Jun 18, 2024

acud commented Jun 13, 2024 •

edited

Loading

codecov bot commented Jun 14, 2024 •

edited

Loading

fasmat Jun 18, 2024 •

edited

Loading