[move] Benchmarking historical transactions #15329

georgemitenkov · 2024-11-20T11:16:52Z

Description

This PR introduces a tool to benchmark past transactions (and correctly, unlike existing aptos-debugger move execute-past-transactions ...).

Summary

Aptos debugger tool uses an RPC under the hood when running transactions. This means state view access latencies can be huge. Also, only executes transactions in blocks, without shared caches across blocks like the real executor does.

This PR introduces a better tool to benchmark execution of past transactions. The user has to provide first and last versions of the interval that need to be benchmarked. The tool partitions transactions in the specified closed interval into blocks, and runs all blocks end-to-end, measuring the overall time. During this run, executor is shared (and so are environment and module caches).

There is no commit here, only execution time. For each block, we maintain the state (read-set estimated from 1 run before the benchmarks) on top of which it should run. And so, the benchmark just runs in sequence blocks on top of their initial states (outputs are only used for comparison against the on-chain data).

The tool allows one to override configs to experiment with new features, or with how the execution would look like without some features. For now, we support:

enabling features
disabling features
In the future, we can add more overrides: gas schedule, modules, etc.

It also computes the diffs between expected and new overridden outputs, e.g.:

Transaction 1944524533 diff:
  >>>>>
[gas used] before: 525, after: 517
[event] 0000000000000000000000000000000000000000000000000000000000000001::transaction_fee::FeeStatement has changed its data
[write] StateKey::AccessPath { address: 0xc05e013f9f81d699e5a991bd134ab8a9af4ec609714f1fd8bfce4fc38419c890, path: "Resource(0x1::coin::CoinStore<0x1::aptos_coin::AptosCoin>)" } has changed its value
[write] StateKey::TableItem { handle: 1b854694ae746cdbd8d44186ca4929b2b337df21d1c74633be19b2710552fdca, key: 0619dc29a0aac8fa146714058e8dd6d2d0f3bdf5f6331907bf91f3acd81e6935 } has changed its value
[total gas used] before: 525, after: 517
 <<<<<

Example

For example, say we have a new feature, e.g., ENABLE_LOADER_V2. We can benchmark how historical transactions perform with this flag on/off.

The flag is off by default:

target/release/aptos-replay-benchmark --begin-version 1944524532 \
  --end-version 1944524714 --rest-endpoint https://mainnet.aptoslabs.com/v1 \
  --num-repeats 10 --concurrency-levels 8

Got 100/183 txns from RestApi.
Got 183/183 txns from RestApi.
Generating blocks for benchmarking ...
Checking generated blocks ...
Analyzing 24 generated blocks ...
Block 1: versions [1944524532, 1944524541] with 10 transactions
Block 2: versions [1944524542, 1944524546] with 5 transactions
Block 3: versions [1944524547, 1944524552] with 6 transactions
Block 4: versions [1944524553, 1944524560] with 8 transactions
Block 5: versions [1944524561, 1944524565] with 5 transactions
Block 6: versions [1944524566, 1944524572] with 7 transactions
Block 7: versions [1944524573, 1944524577] with 5 transactions
Block 8: versions [1944524578, 1944524587] with 10 transactions
Block 9: versions [1944524588, 1944524597] with 10 transactions
Block 10: versions [1944524598, 1944524602] with 5 transactions
Block 11: versions [1944524603, 1944524609] with 7 transactions
Block 12: versions [1944524610, 1944524616] with 7 transactions
Block 13: versions [1944524617, 1944524624] with 8 transactions
Block 14: versions [1944524625, 1944524632] with 8 transactions
Block 15: versions [1944524633, 1944524639] with 7 transactions
Block 16: versions [1944524640, 1944524647] with 8 transactions
Block 17: versions [1944524648, 1944524656] with 9 transactions
Block 18: versions [1944524657, 1944524663] with 7 transactions
Block 19: versions [1944524664, 1944524668] with 5 transactions
Block 20: versions [1944524669, 1944524680] with 12 transactions
Block 21: versions [1944524681, 1944524687] with 7 transactions
Block 22: versions [1944524688, 1944524695] with 8 transactions
Block 23: versions [1944524696, 1944524705] with 10 transactions
Block 24: versions [1944524706, 1944524714] with 9 transactions
Benchmarking ...

Concurrency level: 8
[1/10] Execution time is 1717ms
[2/10] Execution time is 1540ms
[3/10] Execution time is 1537ms
[4/10] Execution time is 1606ms
[5/10] Execution time is 1629ms
[6/10] Execution time is 1668ms
[7/10] Execution time is 1619ms
[8/10] Execution time is 1479ms
[9/10] Execution time is 1690ms
[10/10] Execution time is 1515ms
Median execution time is 1619ms

With the flag on:

target/release/aptos-replay-benchmark --begin-version 1944524532 \
  --end-version 1944524714 --rest-endpoint https://mainnet.aptoslabs.com/v1 \
  --num-repeats 10 --concurrency-levels 8 --enable-features ENABLE_LOADER_V2

Got 100/183 txns from RestApi.
Got 183/183 txns from RestApi.
Generating blocks for benchmarking ...
Checking generated blocks ...
Analyzing 24 generated blocks ...
Block 1: versions [1944524532, 1944524541] with 10 transactions
Block 2: versions [1944524542, 1944524546] with 5 transactions
Block 3: versions [1944524547, 1944524552] with 6 transactions
Block 4: versions [1944524553, 1944524560] with 8 transactions
Block 5: versions [1944524561, 1944524565] with 5 transactions
Block 6: versions [1944524566, 1944524572] with 7 transactions
Block 7: versions [1944524573, 1944524577] with 5 transactions
Block 8: versions [1944524578, 1944524587] with 10 transactions
Block 9: versions [1944524588, 1944524597] with 10 transactions
Block 10: versions [1944524598, 1944524602] with 5 transactions
Block 11: versions [1944524603, 1944524609] with 7 transactions
Block 12: versions [1944524610, 1944524616] with 7 transactions
Block 13: versions [1944524617, 1944524624] with 8 transactions
Block 14: versions [1944524625, 1944524632] with 8 transactions
Block 15: versions [1944524633, 1944524639] with 7 transactions
Block 16: versions [1944524640, 1944524647] with 8 transactions
Block 17: versions [1944524648, 1944524656] with 9 transactions
Block 18: versions [1944524657, 1944524663] with 7 transactions
Block 19: versions [1944524664, 1944524668] with 5 transactions
Block 20: versions [1944524669, 1944524680] with 12 transactions
Block 21: versions [1944524681, 1944524687] with 7 transactions
Block 22: versions [1944524688, 1944524695] with 8 transactions
Block 23: versions [1944524696, 1944524705] with 10 transactions
Block 24: versions [1944524706, 1944524714] with 9 transactions
Benchmarking ...

Concurrency level: 8
[1/10] Execution time is 899ms
[2/10] Execution time is 905ms
[3/10] Execution time is 901ms
[4/10] Execution time is 903ms
[5/10] Execution time is 905ms
[6/10] Execution time is 905ms
[7/10] Execution time is 899ms
[8/10] Execution time is 896ms
[9/10] Execution time is 898ms
[10/10] Execution time is 904ms
Median execution time is 903ms

Great - we now can quantify the effect of the feature on runtime.

Other related changes

Refactored logging Level to be able to use it from CLI. The behaviour should be the same.
Replaced BlockAptosVM::execute_block with AptosVMBlockExecutor::new().execute_block where possible (benchmark, debugger) so that we use the high-level wrapper, and not the inner type.

How Has This Been Tested?

Manually running benchmarks.

Key Areas to Review

N/A, probably checking logger's Level is still correct.

Type of Change

New feature

Which Components or Systems Does This Change Impact?

Move/Aptos Virtual Machine
Developer Infrastructure

Checklist

I have read and followed the CONTRIBUTING doc
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I identified and added all stakeholders and component owners affected by this change as reviewers
I tested both happy and unhappy path of the functionality
I have made corresponding changes to the documentation

trunk-io · 2024-11-20T11:16:56Z

⏱️ 2h 53m total CI duration on this PR

Slowest 15 Jobs	Cumulative Duration	Recent Runs
rust-cargo-deny	23m	🟩 🟩 🟩 🟩 🟩 (+8 more)
check-dynamic-deps	14m	🟩 🟩 🟩 🟩 🟩 (+8 more)
rust-move-tests	13m	🟩
rust-move-tests	13m	🟩
rust-move-tests	13m	🟩
rust-move-tests	13m	🟩
rust-move-tests	13m	🟩
rust-move-tests	12m	🟩
rust-move-tests	12m	🟩
rust-move-tests	12m	🟩
rust-move-tests	8m	⬜
general-lints	7m	🟩 🟩 🟩 🟩 🟩 (+8 more)
semgrep/ci	6m	🟩 🟩 🟩 🟩 🟩 (+8 more)
rust-move-tests	3m	⬜
file_change_determinator	3m	🟩 🟩 🟩 🟩 🟩 (+8 more)

_{settings ⋅ feedback ⋅ docs ⋅ learn more about trunk.io}

georgemitenkov · 2024-11-20T11:17:09Z

[move] Benchmarking historical transactions #15329 👈 (View in Graphite)
[loader-v2] Addressing simple loader V2 TODOs #15316 : 1 other dependent PR (#15315 )
[loader-v2] Small cleanups & tests #15279 : 1 other dependent PR (#15280 )
[loader-v2] Fixing global cache reads & read-before-write on publish #15285
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

aptos-move/aptos-debugger/src/benchmark_past_transactions.rs

aptos-move/aptos-replay-benchmark/src/main.rs

vineethk

Nice work!

Left a bunch of comments to consider.

crates/aptos-logger/src/metadata.rs

aptos-move/aptos-vm/src/aptos_vm.rs

vineethk · 2024-12-03T15:20:29Z

aptos-move/aptos-replay-benchmark/Cargo.toml

+[package]
+name = "aptos-replay-benchmark"
+version = "0.1.0"
+description = "A tool to replay and locally benchmark on-chain transactions."


Is it possible to add any tests at all (that also run in the CI), so that we know if other changes cause this tool to break?

I think this

#[cfg(test)] mod tests { use super::*; #[test] fn verify_tool() { use clap::CommandFactory; Command::command().debug_assert(); } }

allows us to test commands exhaustively. For CI testing, the plan is to actually use this tool on CI to replay txns and benchmark performance (to track regressions, etc. similar to single-node-performance we have now), so once it is there, it will be "tested" because we use it. Will add a TODO for now, as I will probably add a few options to CLI to download and read transactions/state override.

vineethk · 2024-12-03T15:24:41Z

aptos-move/aptos-replay-benchmark/Cargo.toml

@@ -0,0 +1,30 @@
+[package]
+name = "aptos-replay-benchmark"


Would it make sense to also add an option to profile gas locally, so that one can look at gas cost distribution and trace in more depth (similar to https://aptos.dev/en/build/cli/working-with-move-contracts/local-simulation-benchmarking-and-gas-profiling#overview and #15304)?

Yes, I think our tooling would benefit from more unified CLI features (e.g., here, comparison-e2e-testing can re-use Diffs, etc.). For gas it is a bit tricky here because gas profiler is created inside the VM, so block executor does not have any context (unless we create a special entry point).

I would say in the future we would want to unify this, log gas/sec in addition to pure time, etc.

aptos-move/aptos-replay-benchmark/src/main.rs

aptos-move/aptos-replay-benchmark/src/comparison.rs

aptos-move/aptos-replay-benchmark/src/generator.rs

aptos-move/aptos-replay-benchmark/src/overrides.rs

aptos-move/aptos-replay-benchmark/src/workload.rs

ziaptos

This is great!

I'd add a README.md file to aptos-replay-benchmark.

Also, we had discussion before to stop using the aptos- prefix for aptos-move subdirectories (as it's just noise), the consistency is already broken, so we can continue with non aptos- prefixed ones.

aptos-move/aptos-replay-benchmark/src/block.rs

ziaptos · 2024-12-04T14:58:35Z