This project is an ES Rally benchmark that measures performance of the date_histogram
aggregation
for various workloads.
It was created to reproduce a performance issue reported at Elastic forums and GitHub issues.
Test datasets with different workloads have been uploaded here.
To run the benchmarks
-
Install the latest version of Rally, as described in the official Rally documentation.
-
Configure Rally using
esrally configure
. -
Edit
~/.rally/rally.ini
and add thedata_histogram-benchmark
track in the[tracks]
section as shown below (more details in the Rally docs):[tracks] default.url = https://github.com/elastic/rally-tracks date_histogram-benchmark.url = https://github.com/csoulios/date_histogram-benchmark
-
Run rally track with any of the supported challenges.
esrally --on-error=abort --track-repository=date_histogram-benchmark --distribution-version=[elasticsearch_version] --track date_histogram --challenge=[challenge_name]
A different challenge has been created for loading each of the datasets with different distributions of documents in time:
- timestamps-gaussian-sameday: this dataset represents the actual distribution of log data during a production day. It is a gaussian distribution centered around lunch time (more documents during the day than the night). All documents fit within the same day.
- timestamps-uniform-sameday: All documents fit within the same day but are evenly distributed (same amount of docs every hours).
- timestamps-uniform-1s: Documents are spaced a second apart (the first starts at 2000-01-01T00:00:00.000Z, next is 1 second later).
- timestamps-uniform-10s: 10 second gap between documents.
Special thanks to Bertrand Renuart for reporting this issue and creating the benchmark dataset.