-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for event.ingested and event.processed #453
Comments
It's something we discussed in the past on goes a bit into event tracing. I can definitively see the value of these "measurements" on the event. I wonder if we could come up with a more generic way of storing multiple timestamps and we could let everyone add as many timestamps he needs. |
Well, there is the |
My assumption is something similar exists in Logstash. The challenge I see with this is that we loose the context / meaning of each timestamp. It will also make it impossible to queries on one specific timestamp, for example |
Yes also see a need to provide guidance on how to track pipeline metadata. Timings like you describe here are part of it. We could also have a specific place for errors at the various stages. I agree that keeping all of the context about each step will be a challenge. An array of keywords will lose this context, whereas an array of objects is difficult to use with Kibana... So stay tuned for when we start tackling this, and suggestions are welcome. In the meantime, don't forget that you can already track this the way you want, with custom fields. After experimenting with your use case, further feedback on your finding will be welcome :-) |
Just thinking about some issues, independent on how the storage is actually done (feel free to ignore the SPAM):
|
Not spam at all. Thanks for the thoughts :-) Re-reading your initial issue, I like the approach of focusing on the logical progression of the event in the pipeline, rather than the concrete steps in an arbitrary pipeline. The latter is how I had been thinking about it initially. But if we go on logical progression, we may be able to come up with a straightforward proposal for these timestamps (and other metadata), which users will be able to map on top of their concrete topologies. |
Well, after rethinking my use case and my initial proposal I'm not sure that this is the right way to go. I'm basically interested in the avg. (and min/max/stddev/histogram/...) delays through the whole chain, but not for individual events. The data to metadata ratio is already worse enough for certain documents, so storing this information with the documents will make it even worse, and I don't see any use case in keeping that information. Instead I rather think about forwarding that information along with events and processing it directly in the endpoint. Though similar, the proposal really covers two different use cases:
Each supporting component along the path might add timing information as metadata to events, at any depth it sees fit (e.g. Logstash processors). Even network delays could be calculated that way:
ES as the last component in the path should then extract that information from incoming events, and remove the metadata afterwards. The processed information might be provided by introducing a new stats endpoint. When the data structure correctly represents the topology, one can even automatically draw topology diagrams from it.
This is really about ingest pipeline profiling, where each pipeline and processor adds the timing information as mentioned above. While the node stats endpoint already provides total count and total processing time, it currently neither covers the actual topology (for example, sub-pipelines do not contain their processors and possible another level of sub-pipelines), nor does it calculate the actual delays yet. Additionally it might be helpful to not only provide overall stats for pipelines and processors, but introduce another granularity level like stats per 'document class' (configurable by the user, might default to Maybe both use cases fit into the same data structure, maybe not. What do you think? |
Hm, this still gives not the whole picture like I see it. Let's try it in a different way: Did you ever see or use any network discovery tools? They actively scan and passively listen to your networks, and will, when you give them access to your network devices, extract all possible useful (or not so useful) information from them. In the end you get a more or less nice infrastructure diagram, showing all your network devices and their relationships (aka interconnections), as well as all sort of network and device information and operational statistics. I'd like to propose something similar for the Elastic Stack, name it "automatic infrastructure discovery based on event flows" or something like that. Not by actively scanning anything, but by simply collecting infrastructure information from events, which were sequentially added by participating components along the event path. The nice thing is that you don't need to do anything special on Elasticsearch side, just look for and extract infrastructure information (and of course process it, but that should be simple ;). That could be as simple as just the event processing component identifier and timing information as mentioned in my last comment, and could grow to whatever makes sense for a component (e.g. pipeline profiling data in Logstash or Elasticsearch itself). The components do not even need to add their information to all processed events, based on the current event throughput this could be dynamically adjusted, e.g. only on every tenth event (therefore it is important to note that this will not be like time series data, as the event flow is not predictable). When a component is not aware of that discovery mechanism, then it will simply forward the event without adding anything useful to it (this is an important difference to using separate probe events, where not only all components along the path must support them, but must also route them correctly, e.g. to different Elasticsearch clusters). One day I'd like to see a nice, configurable dashboard in Kibana, showing me the complete event processing infrastructure per event flows, all component information and statistics (like events/s, delay histograms), and their relationships. I should be able to identify any operational issues at a glance (maybe in combination with Watcher), for example event spikes or an extraordinary delay in event processing due to GC. In such cases, it would be useful to (when available) dig into a component further down (e.g. for getting more component statistics, or for profiling pipelines). The possibilities are really open to whatever someone could imagine. I know that this is a lot different from my initial request, and that this no longer the right place to discuss it. Should we move that over to Elasticsearch, what do you suggest? |
To add another use case, we have data from laptops coming in with unpredictable delays due to factors such as being offline, etc. We want alerts on new data as it comes in, so we think that adding a field at ingest time (with an elasticsearch ingest processor or with logstash) would allow us to write alerts like "In the past 5 minutes, were any events received that match [some condition]"
|
does #582 close this? |
Not fully. We've merged and released |
We created meta-issue #940 to discuss capturing pipeline details in ECS. Since one of the two timestamps has already been added, and the other is related to the meta-issue, I'm closing in favor of the meta issue. |
event.created
is described as the time when the event was first read by an agent or by a pipeline. In a distributed setup however that means that information is lost about the time taken to process an event at the various stages.Would it be possible to add new fields to
event
to keep that information? For example,event.processed
can hold the time when e.g. logstash (the last one in a chain or maybe an array) actually processes the event, andevent.ingested
when an ingest pipeline handles it.Does this make sense?
The text was updated successfully, but these errors were encountered: