-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[optqs] bug fixes and perf improvements #15452
base: main
Are you sure you want to change the base?
Conversation
⏱️ 9h 40m total CI duration on this PR
|
4f486f1
to
b55b5f0
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
b55b5f0
to
2b3e3ac
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
bc884d4
to
af611f0
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
9db3836
to
8ebd336
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
3dab484
to
be1a1b2
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
69d80df
to
e989225
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
e989225
to
471761f
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
// NB: this might append the same signers multiple times, but this | ||
// should be very rare and has no negative effects. | ||
for responders in existing_responders { | ||
responders.lock().append(&mut signers.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it feels awkward, every retry will append duplicate signers into the responders?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the alternate is very verbose. I need to check if the vector is a subset and then add them efficiently. There is no harm re-adding though. It's appended at the end, so it's like effectively cycling over the original vector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the vector grows infinitely with the iterations, but seems it can at most duplicate once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can grow if the fetch keeps failing forever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought every iteration it starts with the qc signers + batch signers? or do i missing something?
let mut signers = signers.clone();
signers.append(&mut summary.signers(ordered_authors));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code reads very confusing to me, I see that the batches_and_responders
is re-created but also I don't see the data_ptr.responders do the same, so the data_ptr.responders seems keep growing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦🏽♂️ yeah you are right. I am not replacing data_ptr.responders. Agree, it is confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be I should I just re-issue the request instead of trying to send the updated peers via the Arc<Mutex<>> and accept that there might be duplicate fetches on QS side. This should be rare anyhow? That way this might be much more cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Orthogonally, may be we should make QS fetcher smarter and consolidate fetches to same batches instead of duplicating concurrent fetches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I am updating the responders too properly now. Still not straight-forward code. The real fix is to change batch_requester and BatchReaderImpl to efficiently handle duplicate get requests. That way payload manager code can be very minimal, just fetch without caring about tracking previous requests. I can do that in the next PR. wdyt?
self.payload_manager.get_transactions(block).await?; | ||
|
||
let mut block_voters = None; | ||
let (txns, max_txns_from_block_to_execute) = loop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how this works? it reads like it always get_transactions with None.
it creates two initial futures first (which always uses None) then select on the result and qc, regardless when the result comes back, it returns so it'll never use the block_voters to get_transactions again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 58 sets block_voters
to Some() and continuing the next iteration of loop? If the block_qc_fut returns some QC, then we will extract the voters bitvec and use it for the next iteration. We only break if get_transactions returns a result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh it starts the new iteration with new get_transactions futures 🤦♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured what confused me is the new iteration of the loop, maybe it's less confusing to write it like this?
select! {
qc => {
get_transaction()?
}
txns = get_transactions => txns
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, yeah. Updated the code.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
✅ Forge suite
|
✅ Forge suite
|
Description
This PR includes the following fixes
In optQS, batches can only be fetched from consensus and batch proposer until the block gets a QC. after QC, batches should be fetched from QC signers as well to guarantee progress.
Builds on top of previous commit to update responders in flight without waiting for previous fetch to complete.
If a validators fetches in critical path of proposal and then votes, it is possible that it might already have a SyncInfo for that round because other validators moved on. In such a case, the validator should ignore the proposal instead of voting. This issue came up in logs because some validators couldn't verify the Vote message because the vote round and sync info QC round were same.