-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TxManager] use TxManager to handle transactions with effects #6652
Conversation
9abb04e
to
e9cf311
Compare
e9cf311
to
e2672f9
Compare
0a0c036
to
c0b8262
Compare
c0b8262
to
614b7b6
Compare
After Gateway deprecation, since we plan to execute certificates with effects in execution driver, there should not be a case where AuthorityState is launched but execution driver is not needed. Keeping execution driver in activate authority no longer brings in benefits. But merging execution driver into AuthorityState has additional values: 1. More straight forward to wire TransactionManager to execution driver. 2. Less boilerplate related to metrics. Since execution driver requires an Arc<> reference to AuthorityState, AuthorityState::new() is refactored to return Arc<AuthorityState> instead. This removes the need for callers to wrap AuthorityState wit Arc themselves. This change is a prerequisite for #6652, which uses TransactionManager for handle_certificate_with_effects() by default. Without running execution driver, a number of tests would fail. An alternative approach is to launch ActiveAuthority when execution driver is needed, which is going to be almost all cases for tests and production code. The additional boilerplate seems unnecessary.
614b7b6
to
e399d2b
Compare
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
crates/sui-core/src/authority.rs
Outdated
.notify_read_effects(vec![tx_digest]) | ||
.await? | ||
.pop() | ||
.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this unwrap safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment in the expect()
call. notify_read_effects()
with 1 element input vector should return exact 1 element output vector, so the unwrap() on pop() is safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out before checkpoint v2 can help fullnode catch up, TransactionOrchestrator needs to initiate fetching missing parents via node sync. This behavior is still tested. I updated the comment to replace this function after checkpoint v2.
@@ -309,6 +309,8 @@ where | |||
metrics.tx_directly_executed.inc(); | |||
Ok(()) | |||
} | |||
// TODO: Delete this branch because this error should be impossible to happen with | |||
// execute_certificate_with_effects(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's change the debug!
to an error!
then
e399d2b
to
eba0fd7
Compare
- Rename handle_certificate* to execute_certificate* - In both execute_certificate() and execute_certificate_with_effects(), pass the certificate to TransactionManager then wait for its effects. - The previous logic of executing transactions without waiting for input is renamed to execute_transaction_internal()
eba0fd7
to
ac2ecc0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two small comments, which you can take or leave, I don't feel very strongly about them.
let certs = vec![certificate.clone()]; | ||
self.database | ||
.epoch_store() | ||
.insert_pending_certificates(&certs)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we actually need to persist the certificate here - CheckpointExecutor will handle retries if we crash/restart.
let certs = vec![certificate.clone()]; | ||
self.database | ||
.epoch_store() | ||
.insert_pending_certificates(&certs)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above, but there is a stronger case for leaving this one in, since we may want to persist certificates that arrive via RPC.
Yeah I think this is related to one of our previous discussions, i.e. whether TransactionManager should manage the pending certificates itself, or have Narwhal replay certificates for crash recovery. The considerations are similar. Approach 1: TransactionManager manages certificates pending executionThis is close to the current approach. Executions can happen in parallel, so transactions do not finish executions in order. Narwhal can avoid the complexity of keeping track of the low watermark of executed certificates, which would be necessary if Narwhal is responsible to replay pending certificates. I wonder if checkpoint may be able to avoid some complexity too by just relying on TransactionManager to manage pending certificates. For user submitted RPCs, keeping track of pending certificates may speed up crash recovery a little. Approach 2: Narwhal and Checkpoint manage certificates pending executionIn this approach, we would be able to potentially save a few disk writes in TransactionManager. The downside is that both Narwhal and Checkpoint need to keep track of and persist lower watermark of executed transactions, which may increase complexity. An intermediate approach, e.g. TransactionManager only manages pending certificates from Narwhal, seems less ideal to either approach above. My current preference is approach 1, because it seems simpler. But there could be other reasons why checkpointing needs to keep track of low watermark of executed transactions. We can continue the discussion next week and see what we can improve here. |
Currently
AuthorityState::handle_certificate_with_effects()
does not enqueue transactions toTransactionManager
. Instead, it directly callsprocess_certificate()
to execute transactions, which will return errors if any input object is not yet available. So callers have to retry until a transaction can succeed, which is inefficient.This PR refactors the the function to enqueue transactions to
TransactionManager
for execution. Now callers ofAuthorityState::handle_certificate_with_effects()
do not need to worry about checking errors or retries.