-
Notifications
You must be signed in to change notification settings - Fork 726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instrument the operator with metrics #212
Comments
Basic metrics set up in PR #214. |
@donbowman that exporter is for the actual ES clusters themselves, where this issue is about instrumenting the operator. That said, since that ES exporter installs as a plugin you can install it with an init container as described here: |
It would be very useful to have an instrumented Elasticsearch client and gather metrics about the API calls. One of the observations from scale testing (#357) was that the operator seems to spend most of its time on API calls when managing a large number of Elasticsearch clusters. Having the metrics to backup these observations can help us measure the effects of any optimization efforts on that end. |
Relates #1189. |
Usage data about elastic-licensing might be also worth exposing as Prometheus metrics for admins. |
We have controller-runtime metrics as of #214, the Elasticsearch client is instrumented as of #1189. The question about support for Prometheus histograms is answered as of Elasticsearch 7.6 with the new histogram field mapper I just created #3140 to follow up on the last comment here and am suggesting to close this issue for now. Please reopen if you disagree. |
Metrics we're interested in
ES communication metrics (labeled with the stack name):
K8S requests metrics:
Optional (to discuss?):
Metrics collector
1. prometheus lib instrumentation <- metricbeat -> Elasticsearch
rates are usually expressed with counters, which we visualize by applying a
rate()
function. Counters can be reset when the process restarts. How well does that fit ES/Kibana? We can use a derivative aggregation, does it handle restarts well? Edit: yes, we can tweak this in the TS visual builder: https://www.youtube.com/watch?v=CNR-4kZ6v_Elatencies are usually histograms, with values falling into buckets that we define:
Not sure how to visualize that with ES/Kibana.
2. go-metrics <- metricbeats -> elasticsearch
Main benefit: histograms are simpler than the prometheus alternative, since they emit avg and percentiles values directly.
3. logs <- filebeat -> Elasticsearch
If each reconciliation loop execution is logged anyway, it's quite easy to include all metrics in the logs we produce, and build dashboards with that. In other terms: leave the entire aggregation to ES, don't pre-aggregate.
The text was updated successfully, but these errors were encountered: