-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Breaking][core.metrics] Remove process
field from ops metrics & /stats API
#104124
Comments
Pinging @elastic/kibana-core (Team:Core) |
The current Our new event loop delay collector captures this metric more accurately in a histogram throughout the duration of the process. we might wawnt to rely on that approach or move it around if needed. Related: #98673 |
👍 Added a note on this to the issue where we are discussing the new format (#104031) |
We might need to sync with Metricbeat and Monitoring when applying these changes. |
I created this issue for the Metricbeat team to track them: elastic/beats#27569 |
Also adding that this may be a breaking change to |
It looks like kibana/x-pack/plugins/monitoring/public/components/kibana/instances/instances.tsx Line 136 in b03d85a
Do we have an alternate field to get information about how much memory kibana is using? |
As said in the issue's description, aggregating some of these values don't really make sense. However, for monitoring purposes, maybe we should introduce the aggregated values for a subset of the memory information? the RSS being the most obvious example, as we can imagine monitoring processes wanted to know the total amount of memory Kibana is consuming. As this feature is a breaking change anyway, we should probably use a new section for that, such as |
I think in the end it'll be important for kibana developers and administrators to know how the app is using memory. Since this is probably common for any node.js process, I wonder what @elastic/apm-agent-node-js team has explored in that space. Having a common approach between both APM & metricbeat could be beneficial down the road. |
Nothing much at all, unfortunately. The Node.js APM agent does no instrumentation specific to the
It might be useful to add a label (via |
Does the APM UI then represent each cluster process as it's own host? and if so how do they get named? I'd be curious to see it in the UI if possible. On the kibana side do the workers have any sort of designation like maybe: web, worker, etc? Or are do we just get N generic kibana processes any time someone starts kibana? Is there a way we can run kibana in cluster mode today to see what it looks like in the OS process table maybe? Trying to wrap my head around how we can show a kibana admin real data about memory consumption without it being confusing to look at. |
We're not planning yet on having specialized workers. We'll have one |
And (stepping back from APIs) how would an operator know if the memory usage of the coordinator and workers is reasonable? Even just a bash example would be fine at this stage I think. |
As an operator, I'm not sure the memory usage of specific processes is really the first thing to monitor compared to the aggregated RSS from all processes? Which is why I was asking if it may make sense to expose some of the aggregated values (in #104124 (comment)) |
For sure. I could see a case for both. On one side you want to make sure the whole package isn't using more ram than you're willing to give it. On the other wide we probably want to be able to get performance information for each worker (especially kibana devs trying to help customers root out performance issues). |
@pgayvallet any news on what the new data from kibana is expected to look like? It'd be great to have a way to exercise the new APIs so we can start working on metricbeat/UI changes required. |
One thought is that if the aggregated API doesn't seem feasible, if we have a way metricbeat can query each subprocess and send that to a monitoring cluster, we could aggregate at query time in the UI. |
@matschaffer According to #104031, process data will be available under the process: ProcessInfo; New format: processes: ProcessInfo[]; |
Got a ping from @Bamieh that he and @lukeelmers are good with the proposal in #119560 - closing this as proposed. |
Part of #68626
Blocked by #104031
Summary
Once #104031 is complete and the existing
process
field has been deprecated from our ops metrics & /stats APIs, we'll need to remove it. This is necessary to support a multi-process Kibana as outlined in the clustering RFC: #94057Alternative
As outlined in the RFC, there is a way that we could keep the existing field in place and avoid breaking it in
8.0
, but it has significant downsides.The alternative would essentially be aggregating each of the metrics based on data reported from each worker:
size_limit
in particular could be confusingevent_loop_delay
max mostly makes sense, as we are mostly only interested in that number if it is high anywaypid
anduptime_in_millis
from the coordinator make sense, especially as long as we are killingall workers any time one of them dies. However, in the future if we respawn workers that die, this could be
misleading.
Overall, this approach means the data are less accurate and misleading, and are therefore less helpful for users. That's why making the breaking change in
8.0
would be preferred.The text was updated successfully, but these errors were encountered: