-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose estimated disk usage and watermark information in nodes stats API #8686
Comments
+1 for exposing this in node stats, way better than nagging in the logs |
brainstorming here, but maybe we should have an allocation status API, and each decider can return an explanation + structured flags (allow_all, allow_primary, allow_replica) on a cluster view, node view, or index view (i.e. when concurrent recoveries is breached, ...). If its provided with a specific shard, then maybe we can give details on that specific shard? |
@kimchy +1 on an allocation status API, I think we should separate the two (do both, but separately I mean), I think the disk usage percentage and watermark passed/not-passed should be exposed via the nodes stats API as part of the FsStats as a first step, then we can add the allocation status API as an additional step. |
@dakrone ++ |
@dakrone you planning on returning to this one at some stage? |
@clintongormley yes, I've updated the title for this as well for its actual work. |
This exposes the least and most used disk usage estimates within the "fs" nodes stats output: ```json GET /_nodes/stats/fs?pretty&human { "nodes" : { "34fPVU0uQ_-wWitDzDXX_g" : { "fs" : { "timestamp" : 1481238723550, "total" : { "total" : "396.1gb", "total_in_bytes" : 425343254528, "free" : "140.6gb", "free_in_bytes" : 151068725248, "available" : "120.5gb", "available_in_bytes" : 129438912512 }, "least_usage_estimate" : { "path" : "/home/hinmanm/es/elasticsearch/distribution/build/cluster/run node0/elasticsearch-6.0.0-alpha1-SNAPSHOT/data/nodes/0", "total" : "396.1gb", "total_in_bytes" : 425343254528, "available" : "120.5gb", "available_in_bytes" : 129438633984, "used_disk_percent" : 69.56842912023208 }, "most_usage_estimate" : { "path" : "/home/hinmanm/es/elasticsearch/distribution/build/cluster/run node0/elasticsearch-6.0.0-alpha1-SNAPSHOT/data/nodes/0", "total" : "396.1gb", "total_in_bytes" : 425343254528, "available" : "120.5gb", "available_in_bytes" : 129438633984, "used_disk_percent" : 69.56842912023208 }, "data" : [{...}], "io_stats" : {...} } } } } ``` Resolves elastic#8686
Hi All, sorry to comment on this blast from the past but I've recently found myself needing this type of functionality. Was the PR accidentally closed instead of accepted or something? It looks like everyone was in agreement an MR was made and then closed. |
I'm not sure why it's marked as
This information is also available in the allocation explain API if you pass the |
Github only marks squashed commits as "Merged" if you used the merge button, otherwise they are marked as closed. I probably merged it manually (or hey, maybe the merge button wasn't even around back then). |
Ah, my bad, thanks for taking the time to respond. We don't see this in our 6.8.1 clusters, so I was confused and thought it didn't make it in. My issue must be something else. |
@godber this came up again internally and we found that these stats are indeed sometimes (often) missing or stale, so that might explain your issue. The fix would have been rather convoluted, and the correct stats are available from the allocation explain API, and/or can be computed from the stats API response, so we've reverted this change in #59755. |
@DaveCTurner thanks for remembering me and pinging me here! |
We currently log whether a node is over the high and low watermark once every 30 seconds. To be more efficient and reduce the amount of logs generated, we should log it only once (using a latch) when the watermark is exceeded, and once when the disk goes back under the watermark.
We should also expose whether a node is above the watermarks in the nodes stats.
The text was updated successfully, but these errors were encountered: