[Performance] microbenchmark framework #3141

ZhiHanZ · 2021-11-28T15:18:00Z

Summary
In order to summarize on query engine performance and get avoid from performance rollback, we should have some generic microbenchmark framework to run during each nightly release.
Microbenchmark is designed as regular approach to get avoid of performance rollback, which perform as supplementary role in performance engineering. Comparing to e2e benchmark, microbenchmark rely on static data and trying to resolve performance rollback issue during each release(or commit) by running same query to different storage backend.
each performance test should support to target on different storage backend given number of iteration and should be scalable to cluster level tests.
performance benchmark should behave like the following Pseudo code

def bench_s3(i : iteration, r: reference) {
    c = init_query_cluster(s3)
    execute_query(i)
   collect_metrics(c)
   report
   compare_with(r)
}

Output:
Query metrics for each benchmark query

Metric	Average
planner	1s
optimizer	2s
...	...

generic summarizations

benchmark name	branch name	time	compare
memory sum	12e34	10s	20 % +

Possible implementation
Adopting some CI tool such as cirrus CI which provide consistent container env run on k8s to run benchmarks in uniform way.
Critirion.rs(https://bheisler.github.io/criterion.rs/book/criterion_rs.html) maybe a good library for microbenchmark implementation

Should fix:
#3084

Reference:
http://www.vldb.org/pvldb/vol13/p3285-gruenheid.pdf

BohuTANG added the A-query Area: databend query label Nov 29, 2021

ZhiHanZ closed this as completed May 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] microbenchmark framework #3141

[Performance] microbenchmark framework #3141

ZhiHanZ commented Nov 28, 2021

[Performance] microbenchmark framework #3141

[Performance] microbenchmark framework #3141

Comments

ZhiHanZ commented Nov 28, 2021