Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[move] Benchmarking historical transactions #15329

Merged
merged 9 commits into from
Dec 6, 2024

Conversation

georgemitenkov
Copy link
Contributor

@georgemitenkov georgemitenkov commented Nov 20, 2024

Description

This PR introduces a tool to benchmark past transactions (and correctly, unlike existing aptos-debugger move execute-past-transactions ...).

Summary

Aptos debugger tool uses an RPC under the hood when running transactions. This means state view access latencies can be huge. Also, only executes transactions in blocks, without shared caches across blocks like the real executor does.

This PR introduces a better tool to benchmark execution of past transactions. The user has to provide first and last versions of the interval that need to be benchmarked. The tool partitions transactions in the specified closed interval into blocks, and runs all blocks end-to-end, measuring the overall time. During this run, executor is shared (and so are environment and module caches).

There is no commit here, only execution time. For each block, we maintain the state (read-set estimated from 1 run before the benchmarks) on top of which it should run. And so, the benchmark just runs in sequence blocks on top of their initial states (outputs are only used for comparison against the on-chain data).

The tool allows one to override configs to experiment with new features, or with how the execution would look like without some features. For now, we support:

  • enabling features
  • disabling features
    In the future, we can add more overrides: gas schedule, modules, etc.

It also computes the diffs between expected and new overridden outputs, e.g.:

Transaction 1944524533 diff:
  >>>>>
[gas used] before: 525, after: 517
[event] 0000000000000000000000000000000000000000000000000000000000000001::transaction_fee::FeeStatement has changed its data
[write] StateKey::AccessPath { address: 0xc05e013f9f81d699e5a991bd134ab8a9af4ec609714f1fd8bfce4fc38419c890, path: "Resource(0x1::coin::CoinStore<0x1::aptos_coin::AptosCoin>)" } has changed its value
[write] StateKey::TableItem { handle: 1b854694ae746cdbd8d44186ca4929b2b337df21d1c74633be19b2710552fdca, key: 0619dc29a0aac8fa146714058e8dd6d2d0f3bdf5f6331907bf91f3acd81e6935 } has changed its value
[total gas used] before: 525, after: 517
 <<<<<

Example

For example, say we have a new feature, e.g., ENABLE_LOADER_V2. We can benchmark how historical transactions perform with this flag on/off.

The flag is off by default:

target/release/aptos-replay-benchmark --begin-version 1944524532 \
  --end-version 1944524714 --rest-endpoint https://mainnet.aptoslabs.com/v1 \
  --num-repeats 10 --concurrency-levels 8

Got 100/183 txns from RestApi.
Got 183/183 txns from RestApi.
Generating blocks for benchmarking ...
Checking generated blocks ...
Analyzing 24 generated blocks ...
Block 1: versions [1944524532, 1944524541] with 10 transactions
Block 2: versions [1944524542, 1944524546] with 5 transactions
Block 3: versions [1944524547, 1944524552] with 6 transactions
Block 4: versions [1944524553, 1944524560] with 8 transactions
Block 5: versions [1944524561, 1944524565] with 5 transactions
Block 6: versions [1944524566, 1944524572] with 7 transactions
Block 7: versions [1944524573, 1944524577] with 5 transactions
Block 8: versions [1944524578, 1944524587] with 10 transactions
Block 9: versions [1944524588, 1944524597] with 10 transactions
Block 10: versions [1944524598, 1944524602] with 5 transactions
Block 11: versions [1944524603, 1944524609] with 7 transactions
Block 12: versions [1944524610, 1944524616] with 7 transactions
Block 13: versions [1944524617, 1944524624] with 8 transactions
Block 14: versions [1944524625, 1944524632] with 8 transactions
Block 15: versions [1944524633, 1944524639] with 7 transactions
Block 16: versions [1944524640, 1944524647] with 8 transactions
Block 17: versions [1944524648, 1944524656] with 9 transactions
Block 18: versions [1944524657, 1944524663] with 7 transactions
Block 19: versions [1944524664, 1944524668] with 5 transactions
Block 20: versions [1944524669, 1944524680] with 12 transactions
Block 21: versions [1944524681, 1944524687] with 7 transactions
Block 22: versions [1944524688, 1944524695] with 8 transactions
Block 23: versions [1944524696, 1944524705] with 10 transactions
Block 24: versions [1944524706, 1944524714] with 9 transactions
Benchmarking ...

Concurrency level: 8
[1/10] Execution time is 1717ms
[2/10] Execution time is 1540ms
[3/10] Execution time is 1537ms
[4/10] Execution time is 1606ms
[5/10] Execution time is 1629ms
[6/10] Execution time is 1668ms
[7/10] Execution time is 1619ms
[8/10] Execution time is 1479ms
[9/10] Execution time is 1690ms
[10/10] Execution time is 1515ms
Median execution time is 1619ms

With the flag on:

target/release/aptos-replay-benchmark --begin-version 1944524532 \
  --end-version 1944524714 --rest-endpoint https://mainnet.aptoslabs.com/v1 \
  --num-repeats 10 --concurrency-levels 8 --enable-features ENABLE_LOADER_V2

Got 100/183 txns from RestApi.
Got 183/183 txns from RestApi.
Generating blocks for benchmarking ...
Checking generated blocks ...
Analyzing 24 generated blocks ...
Block 1: versions [1944524532, 1944524541] with 10 transactions
Block 2: versions [1944524542, 1944524546] with 5 transactions
Block 3: versions [1944524547, 1944524552] with 6 transactions
Block 4: versions [1944524553, 1944524560] with 8 transactions
Block 5: versions [1944524561, 1944524565] with 5 transactions
Block 6: versions [1944524566, 1944524572] with 7 transactions
Block 7: versions [1944524573, 1944524577] with 5 transactions
Block 8: versions [1944524578, 1944524587] with 10 transactions
Block 9: versions [1944524588, 1944524597] with 10 transactions
Block 10: versions [1944524598, 1944524602] with 5 transactions
Block 11: versions [1944524603, 1944524609] with 7 transactions
Block 12: versions [1944524610, 1944524616] with 7 transactions
Block 13: versions [1944524617, 1944524624] with 8 transactions
Block 14: versions [1944524625, 1944524632] with 8 transactions
Block 15: versions [1944524633, 1944524639] with 7 transactions
Block 16: versions [1944524640, 1944524647] with 8 transactions
Block 17: versions [1944524648, 1944524656] with 9 transactions
Block 18: versions [1944524657, 1944524663] with 7 transactions
Block 19: versions [1944524664, 1944524668] with 5 transactions
Block 20: versions [1944524669, 1944524680] with 12 transactions
Block 21: versions [1944524681, 1944524687] with 7 transactions
Block 22: versions [1944524688, 1944524695] with 8 transactions
Block 23: versions [1944524696, 1944524705] with 10 transactions
Block 24: versions [1944524706, 1944524714] with 9 transactions
Benchmarking ...

Concurrency level: 8
[1/10] Execution time is 899ms
[2/10] Execution time is 905ms
[3/10] Execution time is 901ms
[4/10] Execution time is 903ms
[5/10] Execution time is 905ms
[6/10] Execution time is 905ms
[7/10] Execution time is 899ms
[8/10] Execution time is 896ms
[9/10] Execution time is 898ms
[10/10] Execution time is 904ms
Median execution time is 903ms

Great - we now can quantify the effect of the feature on runtime.

Other related changes

  1. Refactored logging Level to be able to use it from CLI. The behaviour should be the same.
  2. Replaced BlockAptosVM::execute_block with AptosVMBlockExecutor::new().execute_block where possible (benchmark, debugger) so that we use the high-level wrapper, and not the inner type.

How Has This Been Tested?

Manually running benchmarks.

Key Areas to Review

N/A, probably checking logger's Level is still correct.

Type of Change

  • New feature

Which Components or Systems Does This Change Impact?

  • Move/Aptos Virtual Machine
  • Developer Infrastructure

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

Copy link

trunk-io bot commented Nov 20, 2024

⏱️ 2h 53m total CI duration on this PR
Slowest 15 Jobs Cumulative Duration Recent Runs
rust-cargo-deny 23m 🟩🟩🟩🟩🟩 (+8 more)
check-dynamic-deps 14m 🟩🟩🟩🟩🟩 (+8 more)
rust-move-tests 13m 🟩
rust-move-tests 13m 🟩
rust-move-tests 13m 🟩
rust-move-tests 13m 🟩
rust-move-tests 13m 🟩
rust-move-tests 12m 🟩
rust-move-tests 12m 🟩
rust-move-tests 12m 🟩
rust-move-tests 8m
general-lints 7m 🟩🟩🟩🟩🟩 (+8 more)
semgrep/ci 6m 🟩🟩🟩🟩🟩 (+8 more)
rust-move-tests 3m
file_change_determinator 3m 🟩🟩🟩🟩🟩 (+8 more)

settingsfeedbackdocs ⋅ learn more about trunk.io

Copy link
Contributor Author

georgemitenkov commented Nov 20, 2024

@georgemitenkov georgemitenkov changed the title [refactoring] Use AptosVMBlockExecutor where possible [aptos-debugger] Correct benchmark via debugger Nov 20, 2024
@georgemitenkov georgemitenkov force-pushed the george/loader-v2-benchmark branch from 5f0697c to 9d3bc98 Compare November 20, 2024 18:04
@georgemitenkov georgemitenkov force-pushed the george/loader-v2-todos-script-location branch from 3b035a7 to 429acfd Compare November 20, 2024 18:06
@georgemitenkov georgemitenkov force-pushed the george/loader-v2-benchmark branch 2 times, most recently from 4beb0d6 to 67ec63d Compare November 20, 2024 18:08
@georgemitenkov georgemitenkov force-pushed the george/loader-v2-todos-script-location branch from 429acfd to a351639 Compare November 20, 2024 21:16
@georgemitenkov georgemitenkov force-pushed the george/loader-v2-benchmark branch from 67ec63d to 2d8c42b Compare November 20, 2024 21:17
Base automatically changed from george/loader-v2-todos-script-location to main November 20, 2024 21:51
@georgemitenkov georgemitenkov force-pushed the george/loader-v2-benchmark branch 2 times, most recently from 9ff5305 to 8d71d53 Compare November 21, 2024 16:01
@georgemitenkov georgemitenkov force-pushed the george/loader-v2-benchmark branch 3 times, most recently from bc4d63b to e5d9058 Compare November 21, 2024 16:26
@georgemitenkov georgemitenkov marked this pull request as ready for review November 21, 2024 16:42
@georgemitenkov georgemitenkov changed the title [aptos-debugger] Correct benchmark via debugger [aptos-replay-benhcmark] Bencmarking historical transactions Nov 21, 2024
Copy link
Contributor

@vineethk vineethk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Left a bunch of comments to consider.

crates/aptos-logger/src/metadata.rs Outdated Show resolved Hide resolved
aptos-move/aptos-vm/src/aptos_vm.rs Show resolved Hide resolved
[package]
name = "aptos-replay-benchmark"
version = "0.1.0"
description = "A tool to replay and locally benchmark on-chain transactions."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add any tests at all (that also run in the CI), so that we know if other changes cause this tool to break?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn verify_tool() {
        use clap::CommandFactory;
        Command::command().debug_assert();
    }
}

allows us to test commands exhaustively. For CI testing, the plan is to actually use this tool on CI to replay txns and benchmark performance (to track regressions, etc. similar to single-node-performance we have now), so once it is there, it will be "tested" because we use it. Will add a TODO for now, as I will probably add a few options to CLI to download and read transactions/state override.

@@ -0,0 +1,30 @@
[package]
name = "aptos-replay-benchmark"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to also add an option to profile gas locally, so that one can look at gas cost distribution and trace in more depth (similar to https://aptos.dev/en/build/cli/working-with-move-contracts/local-simulation-benchmarking-and-gas-profiling#overview and #15304)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think our tooling would benefit from more unified CLI features (e.g., here, comparison-e2e-testing can re-use Diffs, etc.). For gas it is a bit tricky here because gas profiler is created inside the VM, so block executor does not have any context (unless we create a special entry point).

I would say in the future we would want to unify this, log gas/sec in addition to pure time, etc.

aptos-move/aptos-replay-benchmark/src/main.rs Outdated Show resolved Hide resolved
aptos-move/aptos-replay-benchmark/src/comparison.rs Outdated Show resolved Hide resolved
aptos-move/aptos-replay-benchmark/src/generator.rs Outdated Show resolved Hide resolved
aptos-move/aptos-replay-benchmark/src/overrides.rs Outdated Show resolved Hide resolved
aptos-move/aptos-replay-benchmark/src/overrides.rs Outdated Show resolved Hide resolved
aptos-move/aptos-replay-benchmark/src/workload.rs Outdated Show resolved Hide resolved
@georgemitenkov georgemitenkov force-pushed the george/loader-v2-benchmark branch 2 times, most recently from 290e210 to b73fc74 Compare December 4, 2024 10:53
Copy link
Contributor

@ziaptos ziaptos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

I'd add a README.md file to aptos-replay-benchmark.

Also, we had discussion before to stop using the aptos- prefix for aptos-move subdirectories (as it's just noise), the consistency is already broken, so we can continue with non aptos- prefixed ones.

aptos-move/aptos-replay-benchmark/src/block.rs Outdated Show resolved Hide resolved
aptos-move/aptos-replay-benchmark/src/block.rs Outdated Show resolved Hide resolved
/// Holds configuration for running the benchmarks and measuring the time taken.
pub struct BenchmarkRunner {
concurrency_levels: Vec<usize>,
num_repeats: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might want to have num_warmup_repeats and num_measured_repeats here for more flexible measurements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I think also for warm-up txns? We probably want to execute A to B, and then B to C and measure B to C only, so we simulate "cached" environment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what I was suggesting, first num_warmup_repeats (A to B), then number measured (B+1 to C). we can then simulate cached environment, as well as fully cold environment. We might also want num warmup to be large enough for CPUs with speed stepping...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added that in a slightly different way: user can specify how many blocks to skip. This way we avoid weird cases where versions are not block boundaries (which we probably should enforce)

I guess for good benchmarks we do want to disable dynamic frequency scaling anyway.

aptos-move/aptos-replay-benchmark/src/runner.rs Outdated Show resolved Hide resolved
let start_time = Instant::now();
block.run(&executor, concurrency_level);
let time = start_time.elapsed().as_millis();
println!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional - myabe log instead of println!

Copy link
Contributor Author

@georgemitenkov georgemitenkov Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I avoid logs is that they are very verbose on parallel execution/API/storage side: we log when speculative logs are flushed, Block-STM finishes execution, etc. - hence the default Error log level.

Do you mean log to a file? Ideally, we can have an option to save results in a CSV so we can re-use, but printing seems good enough as we can have a script invoking the executable and piping stdout into a file in the right format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, ideally our logger could have specialized streams but alas it doesn't.

aptos-move/aptos-replay-benchmark/src/runner.rs Outdated Show resolved Hide resolved
aptos-move/aptos-replay-benchmark/src/runner.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@gelash gelash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work

aptos-move/aptos-replay-benchmark/src/block.rs Outdated Show resolved Hide resolved
aptos-move/aptos-replay-benchmark/src/workload.rs Outdated Show resolved Hide resolved
aptos-move/aptos-replay-benchmark/src/block.rs Outdated Show resolved Hide resolved
aptos-move/aptos-replay-benchmark/src/block.rs Outdated Show resolved Hide resolved
aptos-move/aptos-replay-benchmark/src/block.rs Outdated Show resolved Hide resolved
aptos-move/aptos-replay-benchmark/src/runner.rs Outdated Show resolved Hide resolved
aptos-move/aptos-replay-benchmark/src/state_view.rs Outdated Show resolved Hide resolved
aptos-move/aptos-replay-benchmark/src/state_view.rs Outdated Show resolved Hide resolved
aptos-move/aptos-replay-benchmark/src/overrides.rs Outdated Show resolved Hide resolved
pub(crate) fn get_state_override(
&self,
state_view: &impl StateView,
) -> HashMap<StateKey, StateValue> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one can argue that we should return a pair of state override key and value, since it's always one. HashMap makes this aspect confusing, and also the whole override potentially could include more things than features, I suppose (now or later)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to extend it beyond features, hence the hashmap. Ideally reading from the path where we preciously generated and saved overrides for state keys, e.g. framework code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gas feature version is also interesting, but affects the behaviour more. The plan is to allow users to not consider outputs as different if diffs only contain gas payment related changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we aren't going to extend immediately, it makes sense to have restricted API then extend with that PR, but feel free to keep as well, I don't think it's too important here. My main concern was types for partial and complete overrides being the same can be confusing and better to avoid as long as possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by partial / complete override?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imagine we have multiple pieces that can override the state. Say I had two functions, for different use-cases / reasons, and one of them only produced a single key-value pair. It seems to me safer to return (key, value) from that API as there is no chance to confuse that (partial) override with the final collection that's used to override (HashMap). But if there are also APIs that return similar collections, etc, the distinction becomes not that clear and useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see! Let's land this - I am about to create a follow up anyway to read override from state, so we can override framework, so I will rework some of those APIs. Basically, I think we can have an option to

  1. Override features (CLI)
  2. Override gas feature version (CLI)
  3. Override compiled modules (point to compiled move package) (CLI)

@georgemitenkov georgemitenkov force-pushed the george/loader-v2-benchmark branch from b73fc74 to d56ae9d Compare December 6, 2024 17:27
Copy link
Contributor

@ziaptos ziaptos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go go

@georgemitenkov georgemitenkov enabled auto-merge (squash) December 6, 2024 20:44

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

github-actions bot commented Dec 6, 2024

✅ Forge suite realistic_env_max_load success on 1ad215f800faf6ee9daddb6f29ddfb97df02b25d

two traffics test: inner traffic : committed: 14985.51 txn/s, latency: 2654.39 ms, (p50: 2700 ms, p70: 2700, p90: 3000 ms, p99: 3000 ms), latency samples: 5697840
two traffics test : committed: 99.92 txn/s, latency: 1409.65 ms, (p50: 1400 ms, p70: 1400, p90: 1500 ms, p99: 3000 ms), latency samples: 1860
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 1.548, avg: 1.485", "ConsensusProposalToOrdered: max: 0.335, avg: 0.288", "ConsensusOrderedToCommit: max: 0.361, avg: 0.351", "ConsensusProposalToCommit: max: 0.647, avg: 0.640"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.85s no progress at version 29519 (avg 0.20s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.66s no progress at version 2510722 (avg 0.66s) [limit 16].
Test Ok

Copy link
Contributor

github-actions bot commented Dec 6, 2024

✅ Forge suite compat success on 17f4b41fb7157192dd4980b292843d84c518ea70 ==> 1ad215f800faf6ee9daddb6f29ddfb97df02b25d

Compatibility test results for 17f4b41fb7157192dd4980b292843d84c518ea70 ==> 1ad215f800faf6ee9daddb6f29ddfb97df02b25d (PR)
1. Check liveness of validators at old version: 17f4b41fb7157192dd4980b292843d84c518ea70
compatibility::simple-validator-upgrade::liveness-check : committed: 17186.63 txn/s, latency: 1935.94 ms, (p50: 1800 ms, p70: 1900, p90: 2400 ms, p99: 6000 ms), latency samples: 574200
2. Upgrading first Validator to new version: 1ad215f800faf6ee9daddb6f29ddfb97df02b25d
compatibility::simple-validator-upgrade::single-validator-upgrading : committed: 7798.26 txn/s, latency: 3690.05 ms, (p50: 3800 ms, p70: 4000, p90: 4200 ms, p99: 4300 ms), latency samples: 141980
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 8113.48 txn/s, latency: 4060.49 ms, (p50: 4300 ms, p70: 4400, p90: 4500 ms, p99: 4700 ms), latency samples: 275980
3. Upgrading rest of first batch to new version: 1ad215f800faf6ee9daddb6f29ddfb97df02b25d
compatibility::simple-validator-upgrade::half-validator-upgrading : committed: 7942.21 txn/s, latency: 3568.33 ms, (p50: 4000 ms, p70: 4100, p90: 4400 ms, p99: 4700 ms), latency samples: 145300
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 8416.80 txn/s, latency: 3871.83 ms, (p50: 4100 ms, p70: 4200, p90: 4500 ms, p99: 4800 ms), latency samples: 280880
4. upgrading second batch to new version: 1ad215f800faf6ee9daddb6f29ddfb97df02b25d
compatibility::simple-validator-upgrade::rest-validator-upgrading : committed: 14401.06 txn/s, latency: 1907.94 ms, (p50: 2100 ms, p70: 2200, p90: 2300 ms, p99: 2400 ms), latency samples: 242220
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 14466.31 txn/s, latency: 2165.71 ms, (p50: 2200 ms, p70: 2300, p90: 2300 ms, p99: 2500 ms), latency samples: 464700
5. check swarm health
Compatibility test for 17f4b41fb7157192dd4980b292843d84c518ea70 ==> 1ad215f800faf6ee9daddb6f29ddfb97df02b25d passed
Test Ok

Copy link
Contributor

github-actions bot commented Dec 6, 2024

✅ Forge suite framework_upgrade success on 17f4b41fb7157192dd4980b292843d84c518ea70 ==> 1ad215f800faf6ee9daddb6f29ddfb97df02b25d

Compatibility test results for 17f4b41fb7157192dd4980b292843d84c518ea70 ==> 1ad215f800faf6ee9daddb6f29ddfb97df02b25d (PR)
Upgrade the nodes to version: 1ad215f800faf6ee9daddb6f29ddfb97df02b25d
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1464.82 txn/s, submitted: 1467.25 txn/s, failed submission: 2.43 txn/s, expired: 2.43 txn/s, latency: 2022.40 ms, (p50: 1800 ms, p70: 2100, p90: 2500 ms, p99: 4000 ms), latency samples: 132800
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1557.49 txn/s, submitted: 1559.71 txn/s, failed submission: 2.21 txn/s, expired: 2.21 txn/s, latency: 1951.79 ms, (p50: 2000 ms, p70: 2100, p90: 2400 ms, p99: 3600 ms), latency samples: 140720
5. check swarm health
Compatibility test for 17f4b41fb7157192dd4980b292843d84c518ea70 ==> 1ad215f800faf6ee9daddb6f29ddfb97df02b25d passed
Upgrade the remaining nodes to version: 1ad215f800faf6ee9daddb6f29ddfb97df02b25d
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1504.64 txn/s, submitted: 1508.24 txn/s, failed submission: 3.60 txn/s, expired: 3.60 txn/s, latency: 2021.64 ms, (p50: 2100 ms, p70: 2100, p90: 2700 ms, p99: 3700 ms), latency samples: 133920
Test Ok

@georgemitenkov georgemitenkov merged commit 809457f into main Dec 6, 2024
88 checks passed
@georgemitenkov georgemitenkov deleted the george/loader-v2-benchmark branch December 6, 2024 21:17
danielxiangzl pushed a commit that referenced this pull request Dec 12, 2024
A tool to benchmark execution of past transactions. The user has to provide first and last
versions of the interval that need to be benchmarked. The tool partitions transactions in
the specified closed interval into blocks, and runs all blocks end-to-end, measuring the
time. During this run, executor is shared (and so are environment and module caches).

There is no commit here, only execution time. For each block, we maintain the state
(read-set estimated from 1 run before the benchmarks) on top of which it should run.
And so, the benchmark just runs in sequence blocks on top of their initial states
(outputs are only used for comparison against the on-chain data).

The tool allows one to override configs to experiment with new features, or with how
the execution would look like without some features. For now, we support:
- enabling features
- disabling features
In the future, we can add more overrides: gas schedule, modules, etc.
danielxiangzl pushed a commit that referenced this pull request Dec 12, 2024
A tool to benchmark execution of past transactions. The user has to provide first and last
versions of the interval that need to be benchmarked. The tool partitions transactions in
the specified closed interval into blocks, and runs all blocks end-to-end, measuring the
time. During this run, executor is shared (and so are environment and module caches).

There is no commit here, only execution time. For each block, we maintain the state
(read-set estimated from 1 run before the benchmarks) on top of which it should run.
And so, the benchmark just runs in sequence blocks on top of their initial states
(outputs are only used for comparison against the on-chain data).

The tool allows one to override configs to experiment with new features, or with how
the execution would look like without some features. For now, we support:
- enabling features
- disabling features
In the future, we can add more overrides: gas schedule, modules, etc.
georgemitenkov added a commit that referenced this pull request Jan 6, 2025
A tool to benchmark execution of past transactions. The user has to provide first and last
versions of the interval that need to be benchmarked. The tool partitions transactions in
the specified closed interval into blocks, and runs all blocks end-to-end, measuring the
time. During this run, executor is shared (and so are environment and module caches).

There is no commit here, only execution time. For each block, we maintain the state
(read-set estimated from 1 run before the benchmarks) on top of which it should run.
And so, the benchmark just runs in sequence blocks on top of their initial states
(outputs are only used for comparison against the on-chain data).

The tool allows one to override configs to experiment with new features, or with how
the execution would look like without some features. For now, we support:
- enabling features
- disabling features
In the future, we can add more overrides: gas schedule, modules, etc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants