Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TxManager] use TxManager to handle transactions with effects #6652

Merged
merged 2 commits into from
Dec 10, 2022

Conversation

mwtian
Copy link
Contributor

@mwtian mwtian commented Dec 8, 2022

Currently AuthorityState::handle_certificate_with_effects() does not enqueue transactions to TransactionManager. Instead, it directly calls process_certificate() to execute transactions, which will return errors if any input object is not yet available. So callers have to retry until a transaction can succeed, which is inefficient.

This PR refactors the the function to enqueue transactions to TransactionManager for execution. Now callers of AuthorityState::handle_certificate_with_effects() do not need to worry about checking errors or retries.

@mwtian mwtian force-pushed the exec-for-effects branch 4 times, most recently from 9abb04e to e9cf311 Compare December 8, 2022 05:14
@mwtian mwtian changed the title [WIP] [TxManager] rely on TxManager for handle_certificate() and handle_certificate_with_effects() Dec 8, 2022
@mwtian mwtian requested review from mystenmark and lxfind December 8, 2022 06:00
@mwtian mwtian marked this pull request as ready for review December 8, 2022 06:00
@mwtian mwtian requested a review from andll December 8, 2022 06:27
@mwtian mwtian force-pushed the exec-for-effects branch 2 times, most recently from 0a0c036 to c0b8262 Compare December 8, 2022 19:46
@mwtian mwtian changed the title [TxManager] rely on TxManager for handle_certificate() and handle_certificate_with_effects() [TxManager] use TxManager to handle transactions by default Dec 8, 2022
@mwtian mwtian changed the title [TxManager] use TxManager to handle transactions by default [TxManager] use TxManager to handle transactions with effects Dec 9, 2022
mwtian added a commit that referenced this pull request Dec 9, 2022
After Gateway deprecation, since we plan to execute certificates with effects in execution driver, there should not be a case where AuthorityState is launched but execution driver is not needed. Keeping execution driver in activate authority no longer brings in benefits. But merging execution driver into AuthorityState has additional values:

1. More straight forward to wire TransactionManager to execution driver.
2. Less boilerplate related to metrics.

Since execution driver requires an Arc<> reference to AuthorityState, AuthorityState::new() is refactored to return Arc<AuthorityState> instead. This removes the need for callers to wrap AuthorityState wit Arc themselves.

This change is a prerequisite for #6652, which uses TransactionManager for handle_certificate_with_effects() by default. Without running execution driver, a number of tests would fail.

An alternative approach is to launch ActiveAuthority when execution driver is needed, which is going to be almost all cases for tests and production code. The additional boilerplate seems unnecessary.
@vercel
Copy link

vercel bot commented Dec 10, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated
wallet-adapter ✅ Ready (Inspect) Visit Preview 💬 Add your feedback Dec 10, 2022 at 2:50AM (UTC)

.notify_read_effects(vec![tx_digest])
.await?
.pop()
.unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this unwrap safe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment in the expect() call. notify_read_effects() with 1 element input vector should return exact 1 element output vector, so the unwrap() on pop() is safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out before checkpoint v2 can help fullnode catch up, TransactionOrchestrator needs to initiate fetching missing parents via node sync. This behavior is still tested. I updated the comment to replace this function after checkpoint v2.

@@ -309,6 +309,8 @@ where
metrics.tx_directly_executed.inc();
Ok(())
}
// TODO: Delete this branch because this error should be impossible to happen with
// execute_certificate_with_effects().
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's change the debug! to an error! then

- Rename handle_certificate* to execute_certificate*
- In both execute_certificate() and execute_certificate_with_effects(),
pass the certificate to TransactionManager then wait for its effects.
- The previous logic of executing transactions without waiting for input
  is renamed to execute_transaction_internal()
@vercel vercel bot temporarily deployed to Preview – explorer-storybook December 10, 2022 01:46 Inactive
@vercel vercel bot temporarily deployed to Preview – wallet-adapter December 10, 2022 01:46 Inactive
@vercel vercel bot temporarily deployed to Preview – wallet-adapter December 10, 2022 02:50 Inactive
@vercel vercel bot temporarily deployed to Preview – explorer-storybook December 10, 2022 02:50 Inactive
@mwtian mwtian requested a review from mystenmark December 10, 2022 03:05
@mwtian mwtian enabled auto-merge (squash) December 10, 2022 04:33
Copy link
Contributor

@mystenmark mystenmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small comments, which you can take or leave, I don't feel very strongly about them.

let certs = vec![certificate.clone()];
self.database
.epoch_store()
.insert_pending_certificates(&certs)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we actually need to persist the certificate here - CheckpointExecutor will handle retries if we crash/restart.

let certs = vec![certificate.clone()];
self.database
.epoch_store()
.insert_pending_certificates(&certs)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above, but there is a stronger case for leaving this one in, since we may want to persist certificates that arrive via RPC.

@mwtian mwtian merged commit f34b482 into MystenLabs:main Dec 10, 2022
@mwtian mwtian deleted the exec-for-effects branch December 10, 2022 19:08
@mwtian
Copy link
Contributor Author

mwtian commented Dec 10, 2022

Yeah I think this is related to one of our previous discussions, i.e. whether TransactionManager should manage the pending certificates itself, or have Narwhal replay certificates for crash recovery. The considerations are similar.

Approach 1: TransactionManager manages certificates pending execution

This is close to the current approach. Executions can happen in parallel, so transactions do not finish executions in order. Narwhal can avoid the complexity of keeping track of the low watermark of executed certificates, which would be necessary if Narwhal is responsible to replay pending certificates. I wonder if checkpoint may be able to avoid some complexity too by just relying on TransactionManager to manage pending certificates. For user submitted RPCs, keeping track of pending certificates may speed up crash recovery a little.

Approach 2: Narwhal and Checkpoint manage certificates pending execution

In this approach, we would be able to potentially save a few disk writes in TransactionManager. The downside is that both Narwhal and Checkpoint need to keep track of and persist lower watermark of executed transactions, which may increase complexity.

An intermediate approach, e.g. TransactionManager only manages pending certificates from Narwhal, seems less ideal to either approach above.

My current preference is approach 1, because it seems simpler. But there could be other reasons why checkpointing needs to keep track of low watermark of executed transactions. We can continue the discussion next week and see what we can improve here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants