-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a new benchmark service #122
Conversation
Forest: DB Benchmark Service Infrastructure Plan: successShow Plan
|
benchmark_db_terraform/modules/benchmark_db/service/run_benchmark.sh
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good start.
benchmark_db_terraform/modules/benchmark_db/service/Gemfile.lock
Outdated
Show resolved
Hide resolved
benchmark_db_terraform/modules/benchmark_db/service/docker-compose.yml
Outdated
Show resolved
Hide resolved
benchmark_db_terraform/modules/benchmark_db/service/run_benchmark.rb
Outdated
Show resolved
Hide resolved
benchmark_db_terraform/modules/benchmark_db/service/run_benchmark.sh
Outdated
Show resolved
Hide resolved
RUN apt-get update && \ | ||
apt-get install --no-install-recommends -y \ | ||
ruby ruby-dev make gcc git git-lfs curl wget pkg-config clang build-essential \ | ||
mesa-opencl-icd ocl-icd-opencl-dev hwloc libhwloc-dev s3cmd aria2 zstd time bzr jq pkg-config && \ | ||
apt-get clean && \ | ||
rm -rf /var/lib/apt/lists/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit puzzled. Why do we need all of those dependencies? Could we just use Docker? We want to use the latest releases of both Forest and Lotus, which are conveniently dockerized. This would reduce the amount of manual work in the benchmark itself.
That said, I'm okay with doing this in a separate issue. If so, let's create it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll create a separate issue for this
results[:import_time][key.to_sym] = "#{elapsed} sec" | ||
results[:validation_time][key.to_sym] = "#{tpm} tipsets/sec" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's wasteful to include the unit in every row of the result. This should be a part of the header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import_time
and validation_time
comes under Metric
header, each metric will have its own unit
we will have to restructure the output format
I can take this out in a separate PR
ruby bench.rb --chain mainnet --tempdir ./tmp --daily | ||
|
||
## Upload benchmark result to s3 weekly file | ||
week_number=$(date +%W) # Week starting on Monday |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is overwriting benchmarks from January 2023 with results from January 2024 expected and by design?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added year to avoid this condition
|
||
## Upload benchmark result to s3 weekly file | ||
week_number=$(date +%W) # Week starting on Monday | ||
s3cmd get s3://"$BENCHMARK_BUCKET"/benchmark-results/weekly-results/weekly_result_"$week_number".csv /tmp/weekly_result_"$week_number".csv --force || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the /tmp/weekly_result_"$week_number".csv
is repeated many times. Let's use a variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look that much into Ruby scripts themselves, given they're mostly imported from Forest. All in all, it'd be good to have some more form in them and potentially unit tests (and a way to easily run them locally). We can think about it in another PR.
Another thing is compiling Rust and Lotus manually. This shouldn't be needed with Docker images and would simplify dependencies and initialization of the benchmark.
Also, I'd like to know the plans for CSV migrations, e.g., what do we do when a column is added/removed/modified in the CSV. We can no longer easily compare the results in such case anymore (well, adding a column should be fine).
week_number=$(date +%W) # Week starting on Monday | ||
s3cmd get s3://"$BENCHMARK_BUCKET"/benchmark-results/weekly-results/weekly_result_"$week_number".csv /tmp/weekly_result_"$week_number".csv --force || | ||
echo "Timestamp,Forest Version,Lotus Version,Chain,Metric,Forest Value,Lotus Value" > /tmp/weekly_result_"$week_number".csv | ||
tail -n +2 -q /chainsafe/result_*.csv >> /tmp/weekly_result_"$week_number".csv && s3cmd --acl-public put /tmp/weekly_result_"$week_number".csv s3://"$BENCHMARK_BUCKET"/benchmark-results/weekly-results/weekly_result_"$week_number".csv || exit 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the plan to deal with breaking changes? Looking at the existing file at https://forest-benchmarks.fra1.digitaloceanspaces.com/benchmark-results/all_results.csv, there were some at one point:
2023-07-05 23:43:49,0.11.0,1.23.4,calibnet,validate_online,11.8,8.0
2023-07-06 01:48:32,0.11.0,1.23.4,mainnet,import,1443,2094
2023-07-06 01:48:32,0.11.0,1.23.4,mainnet,validate_online,0.209,0.175
2023-07-06 02:36:54,0.11.0,1.23.4,calibnet,import,41.1,96
2023-07-06 02:36:54,0.11.0,1.23.4,calibnet,validate_online,2.5,2.6
2023-07-06 04:43:39,0.11.0,1.23.4,mainnet,import,1480,2082
2023-07-06 04:43:39,0.11.0,1.23.4,mainnet,validate_online,0.275,0.184
2023-07-06 16:07:48,0.11.1,1.23.4,calibnet,import_time,44.8 sec,102 sec
2023-07-06 16:07:48,0.11.1,1.23.4,calibnet,validation_time,11.7 tipsets/sec,7.4 tipsets/sec
2023-07-06 18:15:00,0.11.1,1.23.4,mainnet,import_time,1415 sec,2054 sec
2023-07-06 18:15:00,0.11.1,1.23.4,mainnet,validation_time,0.159 tipsets/sec,0.092 tipsets/sec
2023-07-06 16:07:48,0.11.1,1.23.4,calibnet,import_time,44.8 sec,102 sec
2023-07-06 16:07:48,0.11.1,1.23.4,calibnet,validation_time,11.7 tipsets/sec,7.4 tipsets/sec
2023-07-06 18:15:00,0.11.1,1.23.4,mainnet,import_time,1415 sec,2054 sec
2023-07-06 18:15:00,0.11.1,1.23.4,mainnet,validation_time,0.159 tipsets/sec,0.092 tipsets/sec
2023-07-06 19:03:26,0.11.1,1.23.4,calibnet,import_time,43.6 sec,101 sec
2023-07-06 19:03:26,0.11.1,1.23.4,calibnet,validation_time,12.3 tipsets/sec,7.6 tipsets/sec
2023-07-06 21:07:41,0.11.1,1.23.4,mainnet,import_time,1400 sec,2110 sec
2023-07-06 21:07:41,0.11.1,1.23.4,mainnet,validation_time,0.267 tipsets/sec,0.192 tipsets/sec
2023-07-06 16:07:48,0.11.1,1.23.4,calibnet,import_time,44.8 sec,102 sec
Should the former ones be cleared? What is the plan for the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will take care in a separate PR
|
||
s3cmd get s3://"$BENCHMARK_BUCKET"/benchmark-results/all_results.csv /tmp/all_results.csv --force || | ||
echo "Timestamp,Forest Version,Lotus Version,Chain,Metric,Forest Value,Lotus Value" > /tmp/all_results.csv | ||
tail -n +2 -q /chainsafe/result_*.csv >> /tmp/all_results.csv && s3cmd --acl-public put /tmp/all_results.csv s3://"$BENCHMARK_BUCKET"/benchmark-results/all_results.csv || exit 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to exit 1
in case of an error? Shouldn't set -e
that is already in the file work fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corrected
# Current datetime, to append to the log files | ||
datetime = Time.new.strftime '%FT%H:%M:%S' | ||
health_log = "#{LOG_DIR}/benchmark_#{datetime}_health" | ||
sync_log = "#{LOG_DIR}/benchmark_#{datetime}_sync" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is sync and health and sync log exactly and how are they created / populated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, now there is a log file benchmark_#{datetime}_run
which contains actual benchmark run logs and another benchmark_#{datetime}_report
which contains high-level log of benchmark run like when it starts, fails or complete.
Forest: Benchmark Service Infrastructure Plan: successShow Plan
|
Forest: Benchmark Service Infrastructure Plan: successShow Plan
|
Forest: Benchmark Service Infrastructure Plan: successShow Plan
|
Forest: Benchmark Service Infrastructure Plan: successShow Plan
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the concurrency as mentioned in one of the comments. Otherwise, I'm okay with merging this, provided we address improvements in a separate PR soon.
``` | ||
|
||
To deploy the service: | ||
```bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update the docs incase someone wants to run manually?. you can check the sync check
and snapshot services
docs as reference
@lemmih @LesnyRumcajs don’t you guys think we should also set up new relic to monitor the droplet? |
@samuelarogbonlo let's do it in a separate PR and not increase the load on NR until we are sure that the fixes you made land us well below the free tier limit. |
By the way, shall we delete the benchmark script from the Forest repository now? @sudo-shashank |
yes we can once this is merged |
Forest: Benchmark Service Infrastructure Plan: successShow Plan
|
@lemmih can you approve this PR, its waiting for your approval to be merged |
Summary of changes
Changes introduced in this pull request:
Added a new benchmark service:
forest-benchmark
weekly-results and all_results file.#forest-notifications
channel.Reference issue to close (if applicable)
Closes #92
Other information and links