Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate daily benchmark results #2208

Closed
lemmih opened this issue Nov 16, 2022 · 25 comments · Fixed by #2367
Closed

Generate daily benchmark results #2208

lemmih opened this issue Nov 16, 2022 · 25 comments · Fixed by #2367
Assignees
Labels
Performance Priority: 2 - High Very important and should be addressed ASAP Ready Issue is ready for work and anyone can freely assign it to themselves Type: Enhancement

Comments

@lemmih
Copy link
Contributor

lemmih commented Nov 16, 2022

Issue summary

There are two key performance metrics that we want to measure and track over time:

  1. Time to load a mainnet snapshot.
  2. Validation time in tipsets per second.

The absolute performance numbers will depend on hardware. But if we compare against Lotus then the ratio of our performance vs. their performance should be relatively stable across different hardware.

We should create a service (as part of forest-iac) that downloads a snapshot, imports it with both Forest and Lotus, runs each client for, say, 30 minutes, and notes down how many tipsets were validated.

The DO droplets have limited disk space so the database should be cleared between Forest and Lotus runs.

The final product might be a CSV file containing the versions of Forest and Lotus, the time it took to load the snapshots, and the validation speed in tipsets per minute. These files should be uploaded to our DO space such that we have one file per day.

Other information and links

@lemmih lemmih added Priority: 3 - Medium Nice-to-have, does not impede core functionality Type: Enhancement Status: Needs Triage Issue has unresolved discussions and/or needs to be assigned a priority and assignee Performance Ready Issue is ready for work and anyone can freely assign it to themselves labels Nov 16, 2022
@jdjaustin jdjaustin self-assigned this Nov 28, 2022
@jdjaustin
Copy link
Contributor

jdjaustin commented Nov 29, 2022

Current plan:

  • to compare time to load mainnet snapshot: for forest, execute time forest --chain calibnet --encrypt-keystore false --import-snapshot [file] --halt-after-import where [file] is the latest snapshot file; determine appropriate commands in Lotus binary to determine corresponding metric in Lotus.
  • to compare validation time in tipsets per second, may be able to execute something similar to ./target/release/forest --config <tbd> --encrypt-keystore false --import-snapshot <tbd> --halt-after-import --skip-load --height 2368640; alternatively could have binary report the epoch, wait a certain amount of time, and have binary report the epoch again
  • automate these steps in a similar fashion to PR Add benchmark script #2231

@elmattic
Copy link
Contributor

elmattic commented Dec 1, 2022

Maybe not for this PR but we could retrieve amount of gas used per tipset as well. The gas metric should be directly proportional of computational workload in the FVM. Then we could deduce gas/s.

Detailed gas costs can be acquired from the StateReplay API endpoint

(from filecoin-project/lotus#3326)

@lemmih lemmih added Priority: 2 - High Very important and should be addressed ASAP and removed Priority: 3 - Medium Nice-to-have, does not impede core functionality Status: Needs Triage Issue has unresolved discussions and/or needs to be assigned a priority and assignee labels Dec 7, 2022
@jdjaustin
Copy link
Contributor

Using time forest --chain calibnet --encrypt-keystore false --import-snapshot [file] --halt-after-import with latest snapshot:
Screenshot from 2022-12-08 21-53-18

@lemmih
Copy link
Contributor Author

lemmih commented Dec 9, 2022

Using time forest --chain calibnet --encrypt-keystore false --import-snapshot [file] --halt-after-import with latest snapshot: Screenshot from 2022-12-08 21-53-18

Did you delete your questions about the time being off?

Everything looks fine to me. Imported snapshot in: 5s and real 0m5.681s agree with each other.

@jdjaustin
Copy link
Contributor

Next step is to measure the validation time in tipsets per second for Forest. Running forest-cli sync status at different moments in time provides the output displayed in the two screenshots below. Per @elmattic when this is implemented in the metrics script, we will need to verify that the height is measured at a specific stage.
Screenshot from 2022-12-12 17-09-49
Screenshot from 2022-12-12 17-10-05

@lemmih
Copy link
Contributor Author

lemmih commented Dec 13, 2022

What metrics script are you referring to?

Are you able to run the nodes manually to get benchmark results for either Forest or Lotus? Running on calibnet would be as good as mainnet when testing. If not, this is a good place to start. Running a task manually is the first step to automating it.

@jdjaustin
Copy link
Contributor

What metrics script are you referring to?

Are you able to run the nodes manually to get benchmark results for either Forest or Lotus? Running on calibnet would be as good as mainnet when testing. If not, this is a good place to start. Running a task manually is the first step to automating it.

Plan to either modify the benchmark script in #2231 or develop another script to cover these metrics. The screenshots above were manual results from Forest running on calibnet. Today my plan is to learn to run Lotus on calibnet.

@lemmih
Copy link
Contributor Author

lemmih commented Dec 13, 2022

That benchmark script works very differently from what we're trying to do in this issue. It won't help you run these benchmarks and I doubt it's worth updating the script. Once you've figured out how to run the benchmarks manually, you can find inspiration in @elmattic's script regarding how to automate the process.

@jdjaustin
Copy link
Contributor

Manual results from switching to Lotus testnet with make clean calibnet and evaluating snapshot import time with time lotus daemon --import-snapshot [file] --halt-after-import:
Screenshot from 2022-12-13 19-36-00

@jdjaustin
Copy link
Contributor

Similar to Forest, can get the current epoch with lotus sync status while running a node on testnet:
Screenshot from 2022-12-13 19-42-50
Screenshot from 2022-12-13 19-43-00

@elmattic
Copy link
Contributor

elmattic commented Jan 6, 2023

A question popped up yesterday with @jdjaustin for second point:

  1. Validation time in tipsets per second.

Should this be measured just after snapshot loading? So when Forest/Lotus are in msg sync? (SyncStage::Message)?
Or when in follow mode once HEAD is reached?
I believe we should do it more during the former, say during 10 minutes, count number of validated epochs. Maybe we could use APPLY_BLOCKS_TIME as well to create the stat.

@lemmih
Copy link
Contributor Author

lemmih commented Jan 6, 2023

The validation time is constant both before a snapshot has been loaded (at 0 epochs per second) and after HEAD has been reached (at 1 epoch per 30 seconds). We need to measure to epochs per second after a snapshot has been loaded and before HEAD has been reached.

We need to measure both Forest and Lotus so we can't rely on Forest-only metrics.

@jdjaustin
Copy link
Contributor

We need to measure to epochs per second after a snapshot has been loaded and before HEAD has been reached.

Would this be when the Stage is in message sync?

Also @elmattic suggested including memory usage in the snapshot load metrics. Any issues with including that as well in PR #2367?

@lemmih
Copy link
Contributor Author

lemmih commented Jan 6, 2023

We need to measure to epochs per second after a snapshot has been loaded and before HEAD has been reached.

Would this be when the Stage is in message sync?

I believe so, yes.

Also @elmattic suggested including memory usage in the snapshot load metrics. Any issues with including that as well in PR #2367?

Sure, tracking peak memory usage would be nice as well.

@elmattic elmattic self-assigned this Jan 16, 2023
@elmattic elmattic added this to the Forest 🌲 Infrastructure milestone Jan 16, 2023
@elmattic
Copy link
Contributor

elmattic commented Jan 17, 2023

Adding a few subtasks that we need to address:

  • 1. Refactor snapshot path:
    We should make sure that the path of the snapshot is correct from the git-go and that the file exists. Remove also snapshot_dir method (this won't work for LotusBenchmark class).
  • 2. Implement missing methods to make online validation work for Forest and uncomment line after testing both mainnet and calibnet.
  • 3. Implement missing methods for LotusBenchmark class:
    db_size
    clean_db
  • 4. Implement writing results to a .csv file in case of daily benchmark. Also support for writing in the file the commit hash and snapshot url so we can easily reproduce a benchmark a few days later.
  • 5. Support for building clients in a temp directory, handling git clone, git checkout in the build command, expose tag/hash to the Benchmark class. Choose sensible default values.
  • 6. Refactor class hierarchy to have a base benchmark class and move common methods there.
  • 7. Support passing custom config for LotusBenchmark. Add support for Lotus db in temp dir.
  • 8. Add a new "peak" memory benchmark step.
    Once a new snapshot has been imported and the node synced up to HEAD, let it run for 2 hours and measure how RSS and VSZ are evolving. Propose a way to represent this metric.
  • 9. Add a snapshot fetch step to the script (for both db and daily benchmarks) in case no snapshot file is given.
  • 10. Fix rubocop errors. This can be done as an ongoing task and not just at the very last moment.
  • 11. Move the script as part of forest-iac.
  • 12. Update documentation.

@elmattic
Copy link
Contributor

elmattic commented Feb 9, 2023

Removed some sub-tasks, we can do them in another PR.

@elmattic
Copy link
Contributor

elmattic commented Feb 9, 2023

Peak memory benchmark is not really needed since our memory leak was fixed by @hanabi1224.

@LesnyRumcajs
Copy link
Member

Peak memory benchmark is not really needed since our memory leak was fixed by @hanabi1224.

It's still a useful metric. You never know when someone will introduce such a leak by mistake, and with a daily benchmark, we can quickly pinpoint where the regression happened.

@elmattic
Copy link
Contributor

elmattic commented Feb 10, 2023

Peak memory benchmark is not really needed since our memory leak was fixed by @hanabi1224.

It's still a useful metric. You never know when someone will introduce such a leak by mistake, and with a daily benchmark, we can quickly pinpoint where the regression happened.

Yeah I agree, just wanted here to finish this PR faster so we can move the script to iac repo.

That said from experience running forest for a week, it will be hard to have such metric. RSS can fluctuate between 7.5 and 8.5GB.

@lemmih
Copy link
Contributor Author

lemmih commented Jun 2, 2023

Unless I'm misreading the code, this issue is not done.

@lemmih lemmih reopened this Jun 2, 2023
@elmattic
Copy link
Contributor

elmattic commented Jun 2, 2023

@lemmih What exactly is missing regarding the script itself?

If it's the iac part a new issue has been opened here: ChainSafe/forest-iac#92

@lemmih
Copy link
Contributor Author

lemmih commented Jun 2, 2023

@elmattic This issue is for comparing Forest against Lotus. We want to know, not the absolute numbers for loading a snapshot, but the relative numbers compared against Lotus. This Ruby script doesn't do this at all. The script is nice and all but it doesn't even try to solve the problem described in this issue.

@lemmih
Copy link
Contributor Author

lemmih commented Jun 2, 2023

Unless I'm misreading the code, the benchmark script can compare Forest in different configurations (say, mimalloc vs. jemalloc). While that may be useful in its own right, that's completely different from what this issue asks for.

@lemmih
Copy link
Contributor Author

lemmih commented Jun 2, 2023

Hmm, I think I see where the confusion comes from. Josh solved this issue: #2714

@lemmih
Copy link
Contributor Author

lemmih commented Jun 2, 2023

My bad. Was looking at the wrong thing.

@lemmih lemmih closed this as completed Jun 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Priority: 2 - High Very important and should be addressed ASAP Ready Issue is ready for work and anyone can freely assign it to themselves Type: Enhancement
Projects
No open projects
Status: No status
Development

Successfully merging a pull request may close this issue.

4 participants