Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add watermark metrics #611

Merged

Conversation

tomachine
Copy link
Contributor

@tomachine tomachine commented Aug 8, 2022

This solves #310

Adding 4 new metrics: Disk allocation watermark high, low, flood_stage and disk threshold_enabled.
Metrics support both types of disk allocation watermarks - size in bytes and in percent (shown in label of the metrics).

@tomachine tomachine force-pushed the add_watermark_metrics branch 2 times, most recently from f30db1d to d49e7ac Compare August 8, 2022 12:07
@lcaflc
Copy link

lcaflc commented Nov 24, 2022

I add support on this feature. This is an highly needed metric for the monitoring of a production cluster. This allow a clean monitoring state of the ES cluster disk usage status without the need to create host dependent disk usage alarms which is painfull with multiple cluster using different watermark levels.

ValueWithUnit func(clusterSettings ClusterSettingsResponse) (float64, string)
}

func getUnitAndValue(value string) (float64, string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any examples where the values have bytes, only percentage. Do you know which version this changed?

It is against the best practices to have units be a label. Units should always be normalized (i.e. bytes or seconds). I think it would be best to keep the percentage here as a gauge, but convert percentage to a float from 0-1.

I'm also not sure about bringing in another dependency to do the unit conversion.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sysadmind Hi, sorry for delay here. @tomachine prepared this for my issue. We aproach it this way, because this setting could be set both ways and this was sounds acceptable when he wrote this.

If you have other suggestion how to approach this, we are open to suggestions.

You can read about this in official docs and I think this was possible set to percent or absolute value in version 6. Link: https://www.elastic.co/guide/en/elasticsearch/reference/8.6/modules-cluster.html#disk-based-shard-allocation

cluster.routing.allocation.disk.watermark.low logo cloud
(Dynamic) Controls the low watermark for disk usage. It defaults to 85%, meaning that Elasticsearch will not allocate shards to nodes that have more than 85% disk used. It can alternatively be set to a ratio value, e.g., 0.85. It can also be set to an absolute byte value (like 500mb) to prevent Elasticsearch from allocating shards if less than the specified amount of space is available. This setting has no effect on the primary shards of newly-created indices but will prevent their replicas from being allocated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the docs, I can see where this value may be set to a byte value. The docs also mention that the value could be a ratio like 0.8 so that should also be considered.

Looking over this code again, I think overall it is probably pretty close to where it needs to be, however I'm not a fan of the added 3rd party dependency. The package does not seem to be widely used and it means this exporter has to depend on an outside party for another package. I'm also not sure, based on the tests in that package that it would properly parse a value like 500mb anyway.

@rdegez
Copy link

rdegez commented Dec 13, 2022

Would love this feature too (at least percentage) so we can have dynamic alerting threshold based on configured wattermarks and not arbitrary static ones.

@tomachine tomachine force-pushed the add_watermark_metrics branch from d49e7ac to 40d301e Compare May 11, 2023 23:26
Signed-off-by: Tomáš Kadaně <[email protected]>
@tomachine tomachine force-pushed the add_watermark_metrics branch from 40d301e to 05d44ca Compare July 14, 2023 10:52
@tomachine tomachine requested a review from sysadmind July 14, 2023 10:55
@tomachine
Copy link
Contributor Author

I rebased and removed the 3rd party dependency. Also solved the issue with value in bytes/ratio/percents watermark by two types of metrics - that will be collected according to type of watermark value.

return nil
}

func getValueInBytes(value string) float64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a followup, this should return an error so that the metric is not reported and an error is logged when we fail to parse. There are also no tests for this function or the one below to make sure that no future changes break this functionality.

@sysadmind sysadmind merged commit efd6062 into prometheus-community:master Jul 21, 2023
sysadmind added a commit to sysadmind/elasticsearch_exporter that referenced this pull request Jul 21, 2023
Improve the error handling for watermark metrics. When we have an error, do not report a 0 value for the metric.

Adds tests for parsing the ratio/percentage and the human readable bytes in watermark data.

Functionality originally added in prometheus-community#611

Signed-off-by: Joe Adams <[email protected]>
sysadmind added a commit to sysadmind/elasticsearch_exporter that referenced this pull request Oct 15, 2023
Improve the error handling for watermark metrics. When we have an error, do not report a 0 value for the metric.

Adds tests for parsing the ratio/percentage and the human readable bytes in watermark data.

Functionality originally added in prometheus-community#611

Signed-off-by: Joe Adams <[email protected]>
sysadmind added a commit that referenced this pull request Oct 15, 2023
Improve the error handling for watermark metrics. When we have an error, do not report a 0 value for the metric.

Adds tests for parsing the ratio/percentage and the human readable bytes in watermark data.

Functionality originally added in #611

Signed-off-by: Joe Adams <[email protected]>
jaimeyh pushed a commit to sysdiglabs/elasticsearch_exporter that referenced this pull request Jun 14, 2024
Signed-off-by: Tomáš Kadaně <[email protected]>
Co-authored-by: Tomáš Kadaně <[email protected]>
jaimeyh pushed a commit to sysdiglabs/elasticsearch_exporter that referenced this pull request Jun 14, 2024
…etheus-community#752)

Improve the error handling for watermark metrics. When we have an error, do not report a 0 value for the metric.

Adds tests for parsing the ratio/percentage and the human readable bytes in watermark data.

Functionality originally added in prometheus-community#611

Signed-off-by: Joe Adams <[email protected]>
jaimeyh added a commit to sysdiglabs/elasticsearch_exporter that referenced this pull request Jun 14, 2024
jaimeyh added a commit to sysdiglabs/elasticsearch_exporter that referenced this pull request Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants