Generate daily benchmark results #2208

lemmih · 2022-11-16T09:13:49Z

Issue summary

There are two key performance metrics that we want to measure and track over time:

Time to load a mainnet snapshot.
Validation time in tipsets per second.

The absolute performance numbers will depend on hardware. But if we compare against Lotus then the ratio of our performance vs. their performance should be relatively stable across different hardware.

We should create a service (as part of forest-iac) that downloads a snapshot, imports it with both Forest and Lotus, runs each client for, say, 30 minutes, and notes down how many tipsets were validated.

The DO droplets have limited disk space so the database should be cleared between Forest and Lotus runs.

The final product might be a CSV file containing the versions of Forest and Lotus, the time it took to load the snapshots, and the validation speed in tipsets per minute. These files should be uploaded to our DO space such that we have one file per day.

Other information and links

jdjaustin · 2022-11-29T21:13:14Z

Current plan:

to compare time to load mainnet snapshot: for forest, execute time forest --chain calibnet --encrypt-keystore false --import-snapshot [file] --halt-after-import where [file] is the latest snapshot file; determine appropriate commands in Lotus binary to determine corresponding metric in Lotus.
to compare validation time in tipsets per second, may be able to execute something similar to ./target/release/forest --config <tbd> --encrypt-keystore false --import-snapshot <tbd> --halt-after-import --skip-load --height 2368640; alternatively could have binary report the epoch, wait a certain amount of time, and have binary report the epoch again
automate these steps in a similar fashion to PR Add benchmark script #2231

elmattic · 2022-12-01T12:59:38Z

Maybe not for this PR but we could retrieve amount of gas used per tipset as well. The gas metric should be directly proportional of computational workload in the FVM. Then we could deduce gas/s.

Detailed gas costs can be acquired from the StateReplay API endpoint

(from filecoin-project/lotus#3326)

jdjaustin · 2022-12-09T04:00:54Z

Using time forest --chain calibnet --encrypt-keystore false --import-snapshot [file] --halt-after-import with latest snapshot:

lemmih · 2022-12-09T07:28:08Z

Using time forest --chain calibnet --encrypt-keystore false --import-snapshot [file] --halt-after-import with latest snapshot:

Did you delete your questions about the time being off?

Everything looks fine to me. Imported snapshot in: 5s and real 0m5.681s agree with each other.

jdjaustin · 2022-12-12T23:21:59Z

Next step is to measure the validation time in tipsets per second for Forest. Running forest-cli sync status at different moments in time provides the output displayed in the two screenshots below. Per @elmattic when this is implemented in the metrics script, we will need to verify that the height is measured at a specific stage.

lemmih · 2022-12-13T08:34:23Z

What metrics script are you referring to?

Are you able to run the nodes manually to get benchmark results for either Forest or Lotus? Running on calibnet would be as good as mainnet when testing. If not, this is a good place to start. Running a task manually is the first step to automating it.

jdjaustin · 2022-12-13T14:55:17Z

What metrics script are you referring to?

Are you able to run the nodes manually to get benchmark results for either Forest or Lotus? Running on calibnet would be as good as mainnet when testing. If not, this is a good place to start. Running a task manually is the first step to automating it.

Plan to either modify the benchmark script in #2231 or develop another script to cover these metrics. The screenshots above were manual results from Forest running on calibnet. Today my plan is to learn to run Lotus on calibnet.

lemmih · 2022-12-13T15:05:11Z

That benchmark script works very differently from what we're trying to do in this issue. It won't help you run these benchmarks and I doubt it's worth updating the script. Once you've figured out how to run the benchmarks manually, you can find inspiration in @elmattic's script regarding how to automate the process.

jdjaustin · 2022-12-14T01:40:23Z

Manual results from switching to Lotus testnet with make clean calibnet and evaluating snapshot import time with time lotus daemon --import-snapshot [file] --halt-after-import:

jdjaustin · 2022-12-14T01:44:27Z

Similar to Forest, can get the current epoch with lotus sync status while running a node on testnet:

elmattic · 2023-01-06T13:25:13Z

A question popped up yesterday with @jdjaustin for second point:

Validation time in tipsets per second.

Should this be measured just after snapshot loading? So when Forest/Lotus are in msg sync? (SyncStage::Message)?
Or when in follow mode once HEAD is reached?
I believe we should do it more during the former, say during 10 minutes, count number of validated epochs. Maybe we could use APPLY_BLOCKS_TIME as well to create the stat.

lemmih · 2023-01-06T14:53:38Z

The validation time is constant both before a snapshot has been loaded (at 0 epochs per second) and after HEAD has been reached (at 1 epoch per 30 seconds). We need to measure to epochs per second after a snapshot has been loaded and before HEAD has been reached.

We need to measure both Forest and Lotus so we can't rely on Forest-only metrics.

jdjaustin · 2023-01-06T15:01:25Z

We need to measure to epochs per second after a snapshot has been loaded and before HEAD has been reached.

Would this be when the Stage is in message sync?

Also @elmattic suggested including memory usage in the snapshot load metrics. Any issues with including that as well in PR #2367?

lemmih · 2023-01-06T15:09:56Z

We need to measure to epochs per second after a snapshot has been loaded and before HEAD has been reached.

Would this be when the Stage is in message sync?

I believe so, yes.

Also @elmattic suggested including memory usage in the snapshot load metrics. Any issues with including that as well in PR #2367?

Sure, tracking peak memory usage would be nice as well.

elmattic · 2023-01-17T10:02:05Z

elmattic · 2023-02-09T17:40:47Z

Removed some sub-tasks, we can do them in another PR.

elmattic · 2023-02-09T17:42:13Z

Peak memory benchmark is not really needed since our memory leak was fixed by @hanabi1224.

LesnyRumcajs · 2023-02-10T07:38:20Z

Peak memory benchmark is not really needed since our memory leak was fixed by @hanabi1224.

It's still a useful metric. You never know when someone will introduce such a leak by mistake, and with a daily benchmark, we can quickly pinpoint where the regression happened.

elmattic · 2023-02-10T10:09:32Z

Peak memory benchmark is not really needed since our memory leak was fixed by @hanabi1224.

It's still a useful metric. You never know when someone will introduce such a leak by mistake, and with a daily benchmark, we can quickly pinpoint where the regression happened.

Yeah I agree, just wanted here to finish this PR faster so we can move the script to iac repo.

That said from experience running forest for a week, it will be hard to have such metric. RSS can fluctuate between 7.5 and 8.5GB.

lemmih · 2023-06-02T08:25:36Z

Unless I'm misreading the code, this issue is not done.

elmattic · 2023-06-02T08:36:44Z

@lemmih What exactly is missing regarding the script itself?

If it's the iac part a new issue has been opened here: ChainSafe/forest-iac#92

lemmih · 2023-06-02T08:40:17Z

@elmattic This issue is for comparing Forest against Lotus. We want to know, not the absolute numbers for loading a snapshot, but the relative numbers compared against Lotus. This Ruby script doesn't do this at all. The script is nice and all but it doesn't even try to solve the problem described in this issue.

lemmih · 2023-06-02T08:44:55Z

Unless I'm misreading the code, the benchmark script can compare Forest in different configurations (say, mimalloc vs. jemalloc). While that may be useful in its own right, that's completely different from what this issue asks for.

lemmih · 2023-06-02T08:57:23Z

Hmm, I think I see where the confusion comes from. Josh solved this issue: #2714

lemmih · 2023-06-02T09:59:51Z

My bad. Was looking at the wrong thing.

jdjaustin self-assigned this Nov 28, 2022

lemmih added Priority: 2 - High Very important and should be addressed ASAP and removed Priority: 3 - Medium Nice-to-have, does not impede core functionality Status: Needs Triage Issue has unresolved discussions and/or needs to be assigned a priority and assignee labels Dec 7, 2022

lerajk added this to Forest 🌲 Board [depreciated] Dec 12, 2022

jdjaustin mentioned this issue Dec 15, 2022

Generate Daily Benchmark Results #2367

Merged

elmattic self-assigned this Jan 16, 2023

elmattic added this to the Forest 🌲 Infrastructure milestone Jan 16, 2023

jdjaustin mentioned this issue Mar 17, 2023

Improvements to Daily Benchmarks Script #2683

Closed

6 tasks

jdjaustin mentioned this issue May 15, 2023

Automate Comparison of Checksums for Forest and Lotus Snapshots #2559

Closed

elmattic closed this as completed in #2367 May 23, 2023

lemmih reopened this Jun 2, 2023

lemmih mentioned this issue Jun 2, 2023

Run Daily Benchmark Script Automagically ChainSafe/forest-iac#92

Closed

5 tasks

lemmih closed this as completed Jun 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate daily benchmark results #2208

Generate daily benchmark results #2208

lemmih commented Nov 16, 2022

jdjaustin commented Nov 29, 2022 •

edited

Loading

elmattic commented Dec 1, 2022

jdjaustin commented Dec 9, 2022

lemmih commented Dec 9, 2022

jdjaustin commented Dec 12, 2022

lemmih commented Dec 13, 2022

jdjaustin commented Dec 13, 2022

lemmih commented Dec 13, 2022

jdjaustin commented Dec 14, 2022

jdjaustin commented Dec 14, 2022

elmattic commented Jan 6, 2023

lemmih commented Jan 6, 2023

jdjaustin commented Jan 6, 2023

lemmih commented Jan 6, 2023

elmattic commented Jan 17, 2023 •

edited

Loading

elmattic commented Feb 9, 2023

elmattic commented Feb 9, 2023

LesnyRumcajs commented Feb 10, 2023

elmattic commented Feb 10, 2023 •

edited

Loading

lemmih commented Jun 2, 2023

elmattic commented Jun 2, 2023 •

edited

Loading

lemmih commented Jun 2, 2023

lemmih commented Jun 2, 2023

lemmih commented Jun 2, 2023

lemmih commented Jun 2, 2023

Generate daily benchmark results #2208

Generate daily benchmark results #2208

Comments

lemmih commented Nov 16, 2022

jdjaustin commented Nov 29, 2022 • edited Loading

elmattic commented Dec 1, 2022

jdjaustin commented Dec 9, 2022

lemmih commented Dec 9, 2022

jdjaustin commented Dec 12, 2022

lemmih commented Dec 13, 2022

jdjaustin commented Dec 13, 2022

lemmih commented Dec 13, 2022

jdjaustin commented Dec 14, 2022

jdjaustin commented Dec 14, 2022

elmattic commented Jan 6, 2023

lemmih commented Jan 6, 2023

jdjaustin commented Jan 6, 2023

lemmih commented Jan 6, 2023

elmattic commented Jan 17, 2023 • edited Loading

elmattic commented Feb 9, 2023

elmattic commented Feb 9, 2023

LesnyRumcajs commented Feb 10, 2023

elmattic commented Feb 10, 2023 • edited Loading

lemmih commented Jun 2, 2023

elmattic commented Jun 2, 2023 • edited Loading

lemmih commented Jun 2, 2023

lemmih commented Jun 2, 2023

lemmih commented Jun 2, 2023

lemmih commented Jun 2, 2023

jdjaustin commented Nov 29, 2022 •

edited

Loading

elmattic commented Jan 17, 2023 •

edited

Loading

elmattic commented Feb 10, 2023 •

edited

Loading

elmattic commented Jun 2, 2023 •

edited

Loading