-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement 'batch mode' for persisting allocations on the client. #9093
Conversation
22c5c43
to
657a7d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ashtuchkin!
The concept of this change looks good. From an implementation perspective we generally try to avoid having boolean flags, especially in a case like this where the flag ends up having to be passed down thru a few layers of method calls to actually be used. I've marked the specific changes I'd recommend in the review.
(As an aside, I was wondering why we don't see this kind of problem in the server and of course it turns out that raft is batching changes. So the concept here is certainly solid and has precedent in Nomad.)
I'd also like to flag this change for @schmichael and @nickethier because I know they're working on an experiment that ran into client performance issues, so it might be of interest to them.
client/allocrunner/alloc_runner.go
Outdated
@@ -877,36 +877,37 @@ func (ar *allocRunner) destroyImpl() { | |||
ar.destroyedLock.Unlock() | |||
} | |||
|
|||
func (ar *allocRunner) PersistState() error { | |||
func (ar *allocRunner) PersistState(inBatch bool) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's only one caller for PersistState
, so this argument is always true. Let's remove this flag.
client/state/state_database.go
Outdated
func (s *BoltStateDB) PutAllocation(alloc *structs.Allocation) error { | ||
return s.db.Update(func(tx *boltdd.Tx) error { | ||
func (s *BoltStateDB) PutAllocation(alloc *structs.Allocation, inBatch bool) error { | ||
return s.updateOrBatch(inBatch, func(tx *boltdd.Tx) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than having the boolean behavior flag and this function indirection, let's make the signature really super clear by splitting out a PutAllocation
and a PutAllocationBatch
, both of which call into a putAllocationImpl
with the appropriate boltdb function.
Same for PutNetworkStatus
and DeleteAllocationBucket
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. The naming here is tricky; I'd definitely expect PutAllocationBatch
to have a slice of allocations as the argument, not a single allocation.
That makes me think - if we add a new method to the interface anyway, maybe we can make it an actual "batch" variant that explicitly saves multiple allocations and not rely on hidden state using boltdb's Batch method? Wdyt?
This would require changing PersistState
to PersistStateBatch
too, but that shouldn't be hard to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can make it an actual "batch" variant that explicitly saves multiple allocations
I'm open to the idea of this but given how we run PersistState
by iterating over allocation runners (which requires setting and releasing locks on those runner), I suspect that's not going to be very nice for the callers. Give it a go if you'd like though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah handling destroyedLock
and stateLock
might be a problem with real batching.
I've fleshed out the two-method approach you suggested and it doesn't look tidy enough to my taste, so wanted to circle back for advice. Conceptually, that boolean flag is an adjustment in behavior (basically opt-in to a perf optimization), not a new behavior, so it looks weird when I have to duplicate more or less the same code in several places. It looks like this:
// PutAllocation stores an allocation or returns an error if it could
// not be stored.
PutAllocation(*structs.Allocation) error
// PutAllocationInBatch has exactly the same semantics as PutAllocation above,
// with the addition that the write transactions can be shared with other concurrent
// writes, increasing performance in batch scenarios.
PutAllocationInBatch(*structs.Allocation) error
// Get/Put NetworkStatus get and put the allocation's network
// status. It may be nil.
GetNetworkStatus(allocID string) (*structs.AllocNetworkStatus, error)
PutNetworkStatus(allocID string, ns *structs.AllocNetworkStatus) error
// PutNetworkStatusInBatch has exactly the same semantics as PutNetworkStatus above,
// with the addition that the write transactions can be shared with other concurrent
// writes, increasing performance in batch scenarios.
PutNetworkStatusInBatch(*structs.Allocation) error
So while I agree that using booleans to change behavior is a red flag, in this particular case it's arguably not changing semantics, just performance in particular scenarios.
I saw another approach to implementing such behavior adjustments using variadic options, something like this:
PutAllocation(*structs.Allocation, ...WriteOption) error
(see e.g. https://medium.com/soon-london/variadic-configuration-functions-in-go-8cef1c97ce99 for details)
That would avoid polluting cases where no such option is needed, while allowing these adjustments where needed. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would avoid polluting cases where no such option is needed, while allowing these adjustments where needed. What do you think?
Oh, that's super nice. Go for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I think it's ready for review.
Thanks for the review Tim! Will make the changes soon. |
657a7d9
to
6d9e54c
Compare
6d9e54c
to
778f797
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for submitting this @ashtuchkin. I'll let @tgross take care of reviewing and merging.
Have you confirmed this fixes your IO issue @ashtuchkin?
I'm pretty concerned the de-duplication in boltdd isn't helping. Seems like the transactional overhead may remove the IO benefit of the de-duplication in which case we may just be wasting CPU/memory doing it!
Temporary metrics or logging around de-duplication hits/misses may help tell if de-duping is happening. Sadly at a glance it seems like if the transaction overhead does destroy the theoretical IO savings it would take quite a bit of reworking to only opportunistically create the transaction on change.
In hindsight doing de-duplication at the StateDB level may have made more sense.
Thanks Michael!
Yes, I've deployed this version on my cluster and the IOPS does return to normal when the cluster becomes idle.
I traced commit path in boltdb and it does 2 unconditional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great @ashtuchkin! Thanks so much for your patience in getting this into shape.
I've left one question about the test assertion.
client/state/db_test.go
Outdated
// and MaxBatchSize parameters). | ||
if getTxID != nil { | ||
numTransactions := getTxID() - prevTxID | ||
require.Less(numTransactions, 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this "< 10 transactions" determined empirically? The default MaxBatchSize is 1000, which implies we can assert "at least 2" but I'm less certain about the "at most" we can assert. Can we use a timer to figure out how many 10ms periods have passed to make a stronger assertion?
I just want to make sure we're not introducing a potential test flake (this has been a pain point), especially for full unit test runs with parallelism on the VMs used for CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep flake tests is a common pain point - I've wasted so much time on it too.
You're right, most of the time when running this test we get 2 transactions, each with 1000 writes. Time is usually not a limiting factor here – spawning goroutines is pretty fast and writes are all in-memory (the only I/O we have is when these 2 transactions are committed). I used 10 as a "safe" threshold that should hold even under load, but that's of course arbitrary.
I initially opted against adding a more specific "expected number of transactions" calculation here because that would add coupling to specific algorithm that boltdb uses, in addition to adding complexity. Now that I'm thinking about, the first concern is probably not relevant as boltdb is frozen, so ok, let's do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think it's ready for review. I've added a separate commit for this change so that it's easier to see what exactly changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
170b77e
to
5c43490
Compare
Fixes hashicorp#9047, see problem details there. As a solution, we use BoltDB's 'Batch' mode that combines multiple parallel writes into small number of transactions. See https://github.com/boltdb/bolt#batch-read-write-transactions for more information.
5c43490
to
556043b
Compare
@ashtuchkin That makes complete sense, and yes! Boltdd can definitely do that tracking. Should be as easy as adding a Probably worth it's own PR since this one is so close to merging. If you don't have time to do that PR, feel free to file an issue or let me know and I can file an issue. |
@ashtuchkin I've merged this and got it onto the changelog in #9132. This will ship in Nomad 1.0! Thanks again! |
Awesome, thank you Tim! |
…hicorp#9093) Fixes hashicorp#9047, see problem details there. As a solution, we use BoltDB's 'Batch' mode that combines multiple parallel writes into small number of transactions. See https://github.com/boltdb/bolt#batch-read-write-transactions for more information.
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Fixes #9047, see problem details there.
As a solution, we use BoltDB's 'Batch' mode that opportunistically combines multiple parallel writes into a small number of transactions. See https://github.com/boltdb/bolt#batch-read-write-transactions for more information.