-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Figure out how to benchmark the ingest performance for the Kubernetes integration #44
Comments
First, I'll read https://esrally.readthedocs.io/en/stable/quickstart.html to learn the Rally basics. |
Installpipx install esrally |
Checking the existing Rally benchmark for K8s at https://github.com/elastic/k8s-integration-infra/tree/cloudnative/cloudnative |
ClusterUsing a local cluster. TracksCreate a track $ esrally create-track --track="container-logs" --target-hosts=https://localhost:9200 --data-stream="logs-kubernetes.container_logs*" --client-options="use_ssl:true,verify_certs:false,ca_certs:false,basic_auth_user:'elastic',basic_auth_password:'changeme'"
____ ____
/ __ \____ _/ / /_ __
/ /_/ / __ `/ / / / / /
/ _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
/____/
[INFO] Connected to Elasticsearch cluster [5fbd93b394ed] version [8.10.0-SNAPSHOT].
Extracting documents for index [.ds-logs-kubernetes.co... 1000/1000 docs [100.0% done]
Extracting documents for index [.ds-logs-kubernete... 353379/353379 docs [100.0% done]
Extracting documents for index [.ds-logs-kubernetes.co... 1000/1000 docs [100.0% done]
Extracting documents for index [.ds-logs-kubernetes.... 10185/10185 docs [100.0% done]
[INFO] Track container-logs has been created. Run it with: esrally --track-path=/Users/zmoog/code/projects/elastic/k8s-integration-infra/container-logs/tracks/container-logs
---------------------------------
[INFO] SUCCESS (took 113 seconds)
--------------------------------- Checking the results: $ find .
.
./tracks
./tracks/container-logs
./tracks/container-logs/.ds-logs-kubernetes.container_logs.agents-default-2023.08.30-000001-documents.json
./tracks/container-logs/.ds-logs-kubernetes.container_logs-default-2023.08.30-000001-documents.json.bz2
./tracks/container-logs/.ds-logs-kubernetes.container_logs.agents-default-2023.08.30-000001-documents-1k.json
./tracks/container-logs/.ds-logs-kubernetes.container_logs-default-2023.08.30-000001-documents-1k.json
./tracks/container-logs/.ds-logs-kubernetes.container_logs.agents-default-2023.08.30-000001-documents-1k.json.bz2
./tracks/container-logs/.ds-logs-kubernetes.container_logs.agents-default-2023.08.30-000001.json
./tracks/container-logs/.ds-logs-kubernetes.container_logs-default-2023.08.30-000001-documents.json
./tracks/container-logs/.ds-logs-kubernetes.container_logs-default-2023.08.30-000001-documents-1k.json.bz2
./tracks/container-logs/.ds-logs-kubernetes.container_logs-default-2023.08.30-000001.json
./tracks/container-logs/track.json
./tracks/container-logs/.ds-logs-kubernetes.container_logs.agents-default-2023.08.30-000001-documents.json.bz2 List tracks $ esrally list tracks --track-path=tracks/container-logs
____ ____
/ __ \____ _/ / /_ __
/ /_/ / __ `/ / / / / /
/ _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
/____/
Available tracks:
Name Description Documents Compressed Size Uncompressed Size
-------------- ------------------------------------------ ----------- ----------------- -------------------
container-logs Tracker-generated track for container-logs 363,564 15.8 MB 931.1 MB
-------------------------------
[INFO] SUCCESS (took 0 seconds)
------------------------------- List races esrally list races
$ esrally race \
--target-hosts=localhost:9200 \
--track-path=tracks/container-logs \
--client-options="use_ssl:true,verify_certs:false,ca_certs:false,basic_auth_user:'elastic',basic_auth_password:'changeme'" \
--pipeline=benchmark-only \
--kill-running-processes
____ ____
/ __ \____ _/ / /_ __
/ /_/ / __ `/ / / / / /
/ _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
/____/
[INFO] Race id is [63e80dc7-a658-4b67-90c5-a73e198f62ac]
[INFO] Racing on track [container-logs] and car ['external'] with version [8.10.0-SNAPSHOT].
[WARNING] merges_total_time is 195945 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] merges_total_throttled_time is 8581 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] indexing_total_time is 965545 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] refresh_total_time is 254226 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] flush_total_time is 2317453 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
Running delete-index [100% done]
Running create-index [100% done]
Running cluster-health [100% done]
Running bulk [100% done]
------------------------------------------------------
_______ __ _____
/ ____(_)___ ____ _/ / / ___/_________ ________
/ /_ / / __ \/ __ `/ / \__ \/ ___/ __ \/ ___/ _ \
/ __/ / / / / / /_/ / / ___/ / /__/ /_/ / / / __/
/_/ /_/_/ /_/\__,_/_/ /____/\___/\____/_/ \___/
------------------------------------------------------
| Metric | Task | Value | Unit |
|---------------------------------------------------------------:|-------:|-----------------:|-------:|
| Cumulative indexing time of primary shards | | 18.0194 | min |
| Min cumulative indexing time across primary shards | | 0 | min |
| Median cumulative indexing time across primary shards | | 0.0466667 | min |
| Max cumulative indexing time across primary shards | | 5.87675 | min |
| Cumulative indexing throttle time of primary shards | | 0 | min |
| Min cumulative indexing throttle time across primary shards | | 0 | min |
| Median cumulative indexing throttle time across primary shards | | 0 | min |
| Max cumulative indexing throttle time across primary shards | | 0 | min |
| Cumulative merge time of primary shards | | 3.68822 | min |
| Cumulative merge count of primary shards | | 1148 | |
| Min cumulative merge time across primary shards | | 0 | min |
| Median cumulative merge time across primary shards | | 0.0114667 | min |
| Max cumulative merge time across primary shards | | 1.04943 | min |
| Cumulative merge throttle time of primary shards | | 0.259733 | min |
| Min cumulative merge throttle time across primary shards | | 0 | min |
| Median cumulative merge throttle time across primary shards | | 0 | min |
| Max cumulative merge throttle time across primary shards | | 0.259733 | min |
| Cumulative refresh time of primary shards | | 4.27292 | min |
| Cumulative refresh count of primary shards | | 30297 | |
| Min cumulative refresh time across primary shards | | 0 | min |
| Median cumulative refresh time across primary shards | | 0.0077 | min |
| Max cumulative refresh time across primary shards | | 0.879883 | min |
| Cumulative flush time of primary shards | | 38.6595 | min |
| Cumulative flush count of primary shards | | 28281 | |
| Min cumulative flush time across primary shards | | 3.33333e-05 | min |
| Median cumulative flush time across primary shards | | 0.143767 | min |
| Max cumulative flush time across primary shards | | 7.96648 | min |
| Total Young Gen GC time | | 3.002 | s |
| Total Young Gen GC count | | 144 | |
| Total Old Gen GC time | | 0 | s |
| Total Old Gen GC count | | 0 | |
| Store size | | 0.996763 | GB |
| Translog size | | 0.0564239 | GB |
| Heap used for segments | | 0 | MB |
| Heap used for doc values | | 0 | MB |
| Heap used for terms | | 0 | MB |
| Heap used for norms | | 0 | MB |
| Heap used for points | | 0 | MB |
| Heap used for stored fields | | 0 | MB |
| Segment count | | 652 | |
| Total Ingest Pipeline count | | 298632 | |
| Total Ingest Pipeline time | | 24.567 | s |
| Total Ingest Pipeline failed | | 0 | |
| Min Throughput | bulk | 741.76 | docs/s |
| Mean Throughput | bulk | 7042.58 | docs/s |
| Median Throughput | bulk | 7071.05 | docs/s |
| Max Throughput | bulk | 9845.35 | docs/s |
| 50th percentile latency | bulk | 3459.26 | ms |
| 90th percentile latency | bulk | 6476.21 | ms |
| 100th percentile latency | bulk | 9193.49 | ms |
| 50th percentile service time | bulk | 3459.26 | ms |
| 90th percentile service time | bulk | 6476.21 | ms |
| 100th percentile service time | bulk | 9193.49 | ms |
| error rate | bulk | 16.25 | % |
[WARNING] Error rate is 16.25 for operation 'bulk'. Please check the logs.
--------------------------------
[INFO] SUCCESS (took 80 seconds)
-------------------------------- |
Since the error rate for the operation "bulk" is quite high (16.25%), I want to try this using a full fledged cluster instead of the cluster created locally by elastic-package for development. |
Unfortunately, running rally on one of my test cluster in Elastic Cloud does not work and fails systematically:
The "bulk" step never moves beyond 0% and fails with a "network connection timed out" error. |
The problem is that my home network and Internet connection (especially the uplink speed) are a bottleneck. Increasing the timeout from 60 to 120 seconds allows Rally to complete the race, but the throughput numbers could be better. Here's the run with the client option
Maybe we should take the metworking into account. It seems this time the "bulk" task is progressing! Okay, this time the race completed, but the numbers a very low:
|
I am setting up an EC2 instance to run this test again. Copying the track to the EC2 instance: scp -i path/to/key.pem -r container-logs [email protected]:/home/ec2-user/path/to/whatever |
Now using the following setup:
esrally race \
--target-hosts=https://<REDACTED> \
--track-path=tracks/container-logs \
--client-options="use_ssl:true,basic_auth_user:'elastic',basic_auth_password:'<REDACTED>'" \
--pipeline=benchmark-only \
--kill-running-processes \
--on-error=abort
____ ____
/ __ \____ _/ / /_ __
/ /_/ / __ `/ / / / / /
/ _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
/____/
[INFO] Race id is [aa3645d4-0bb9-4918-946d-47d648e87b40]
[INFO] Racing on track [container-logs] and car ['external'] with version [8.10.0].
Running delete-index [100% done]
Running create-index [100% done]
Running cluster-health [100% done]
Running bulk [100% done]
------------------------------------------------------
_______ __ _____
/ ____(_)___ ____ _/ / / ___/_________ ________
/ /_ / / __ \/ __ `/ / \__ \/ ___/ __ \/ ___/ _ \
/ __/ / / / / / /_/ / / ___/ / /__/ /_/ / / / __/
/_/ /_/_/ /_/\__,_/_/ /____/\___/\____/_/ \___/
------------------------------------------------------
| Metric | Task | Value | Unit |
|---------------------------------------------------------------:|-------:|-----------------:|-------:|
| Cumulative indexing time of primary shards | | 0.00015 | min |
| Min cumulative indexing time across primary shards | | 0 | min |
| Median cumulative indexing time across primary shards | | 0 | min |
| Max cumulative indexing time across primary shards | | 8.33333e-05 | min |
| Cumulative indexing throttle time of primary shards | | 0 | min |
| Min cumulative indexing throttle time across primary shards | | 0 | min |
| Median cumulative indexing throttle time across primary shards | | 0 | min |
| Max cumulative indexing throttle time across primary shards | | 0 | min |
| Cumulative merge time of primary shards | | 0 | min |
| Cumulative merge count of primary shards | | 0 | |
| Min cumulative merge time across primary shards | | 0 | min |
| Median cumulative merge time across primary shards | | 0 | min |
| Max cumulative merge time across primary shards | | 0 | min |
| Cumulative merge throttle time of primary shards | | 0 | min |
| Min cumulative merge throttle time across primary shards | | 0 | min |
| Median cumulative merge throttle time across primary shards | | 0 | min |
| Max cumulative merge throttle time across primary shards | | 0 | min |
| Cumulative refresh time of primary shards | | 0.0024 | min |
| Cumulative refresh count of primary shards | | 1448 | |
| Min cumulative refresh time across primary shards | | 0 | min |
| Median cumulative refresh time across primary shards | | 0 | min |
| Max cumulative refresh time across primary shards | | 0.00103333 | min |
| Cumulative flush time of primary shards | | 0.0100333 | min |
| Cumulative flush count of primary shards | | 15 | |
| Min cumulative flush time across primary shards | | 0.00025 | min |
| Median cumulative flush time across primary shards | | 0.000316667 | min |
| Max cumulative flush time across primary shards | | 0.00306667 | min |
| Total Young Gen GC time | | 0.67 | s |
| Total Young Gen GC count | | 18 | |
| Total Old Gen GC time | | 0 | s |
| Total Old Gen GC count | | 0 | |
| Store size | | 0.000137327 | GB |
| Translog size | | 1.1269e-06 | GB |
| Heap used for segments | | 0 | MB |
| Heap used for doc values | | 0 | MB |
| Heap used for terms | | 0 | MB |
| Heap used for norms | | 0 | MB |
| Heap used for points | | 0 | MB |
| Heap used for stored fields | | 0 | MB |
| Segment count | | 4 | |
| Total Ingest Pipeline count | | 353379 | |
| Total Ingest Pipeline time | | 17.575 | s |
| Total Ingest Pipeline failed | | 0 | |
| Min Throughput | bulk | 883.99 | docs/s |
| Mean Throughput | bulk | 7917.35 | docs/s |
| Median Throughput | bulk | 8278.41 | docs/s |
| Max Throughput | bulk | 9813.75 | docs/s |
| 50th percentile latency | bulk | 3405.52 | ms |
| 90th percentile latency | bulk | 4475.19 | ms |
| 100th percentile latency | bulk | 6197.52 | ms |
| 50th percentile service time | bulk | 3405.52 | ms |
| 90th percentile service time | bulk | 4475.19 | ms |
| 100th percentile service time | bulk | 6197.52 | ms |
| error rate | bulk | 0 | % |
--------------------------------
[INFO] SUCCESS (took 82 seconds)
-------------------------------- |
I am adding support for the new routing rules in the Kubernetes integration.
Adding routing involve using the new reroute processor in the container-logs integration. We want to check the performance impact of adding it.
The text was updated successfully, but these errors were encountered: