-
Notifications
You must be signed in to change notification settings - Fork 796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add watermark metrics #611
add watermark metrics #611
Conversation
f30db1d
to
d49e7ac
Compare
I add support on this feature. This is an highly needed metric for the monitoring of a production cluster. This allow a clean monitoring state of the ES cluster disk usage status without the need to create host dependent disk usage alarms which is painfull with multiple cluster using different watermark levels. |
collector/cluster_settings.go
Outdated
ValueWithUnit func(clusterSettings ClusterSettingsResponse) (float64, string) | ||
} | ||
|
||
func getUnitAndValue(value string) (float64, string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any examples where the values have bytes, only percentage. Do you know which version this changed?
It is against the best practices to have units be a label. Units should always be normalized (i.e. bytes or seconds). I think it would be best to keep the percentage here as a gauge, but convert percentage to a float from 0-1.
I'm also not sure about bringing in another dependency to do the unit conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sysadmind Hi, sorry for delay here. @tomachine prepared this for my issue. We aproach it this way, because this setting could be set both ways and this was sounds acceptable when he wrote this.
If you have other suggestion how to approach this, we are open to suggestions.
You can read about this in official docs and I think this was possible set to percent or absolute value in version 6. Link: https://www.elastic.co/guide/en/elasticsearch/reference/8.6/modules-cluster.html#disk-based-shard-allocation
cluster.routing.allocation.disk.watermark.low logo cloud
(Dynamic) Controls the low watermark for disk usage. It defaults to 85%, meaning that Elasticsearch will not allocate shards to nodes that have more than 85% disk used. It can alternatively be set to a ratio value, e.g., 0.85. It can also be set to an absolute byte value (like 500mb) to prevent Elasticsearch from allocating shards if less than the specified amount of space is available. This setting has no effect on the primary shards of newly-created indices but will prevent their replicas from being allocated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the docs, I can see where this value may be set to a byte value. The docs also mention that the value could be a ratio like 0.8
so that should also be considered.
Looking over this code again, I think overall it is probably pretty close to where it needs to be, however I'm not a fan of the added 3rd party dependency. The package does not seem to be widely used and it means this exporter has to depend on an outside party for another package. I'm also not sure, based on the tests in that package that it would properly parse a value like 500mb
anyway.
Would love this feature too (at least percentage) so we can have dynamic alerting threshold based on configured wattermarks and not arbitrary static ones. |
d49e7ac
to
40d301e
Compare
Signed-off-by: Tomáš Kadaně <[email protected]>
40d301e
to
05d44ca
Compare
I rebased and removed the 3rd party dependency. Also solved the issue with value in bytes/ratio/percents watermark by two types of metrics - that will be collected according to type of watermark value. |
return nil | ||
} | ||
|
||
func getValueInBytes(value string) float64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a followup, this should return an error so that the metric is not reported and an error is logged when we fail to parse. There are also no tests for this function or the one below to make sure that no future changes break this functionality.
Improve the error handling for watermark metrics. When we have an error, do not report a 0 value for the metric. Adds tests for parsing the ratio/percentage and the human readable bytes in watermark data. Functionality originally added in prometheus-community#611 Signed-off-by: Joe Adams <[email protected]>
Improve the error handling for watermark metrics. When we have an error, do not report a 0 value for the metric. Adds tests for parsing the ratio/percentage and the human readable bytes in watermark data. Functionality originally added in prometheus-community#611 Signed-off-by: Joe Adams <[email protected]>
Improve the error handling for watermark metrics. When we have an error, do not report a 0 value for the metric. Adds tests for parsing the ratio/percentage and the human readable bytes in watermark data. Functionality originally added in #611 Signed-off-by: Joe Adams <[email protected]>
Signed-off-by: Tomáš Kadaně <[email protected]> Co-authored-by: Tomáš Kadaně <[email protected]>
…etheus-community#752) Improve the error handling for watermark metrics. When we have an error, do not report a 0 value for the metric. Adds tests for parsing the ratio/percentage and the human readable bytes in watermark data. Functionality originally added in prometheus-community#611 Signed-off-by: Joe Adams <[email protected]>
This reverts commit bf2f757.
This reverts commit bf2f757.
This solves #310
Adding 4 new metrics: Disk allocation watermark high, low, flood_stage and disk threshold_enabled.
Metrics support both types of disk allocation watermarks - size in bytes and in percent (shown in label of the metrics).