-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add event.ingested as the ingest timestamp #582
Conversation
@cwurm So together with event.created in a normal situation this would be the order of the timestamps:
Just some thoughts about a Logstash pipeline I'm using. What if an event only has second granulaity. As the source of those events spawn a huge amount of events, I need to use the @timestamp generated by elasticsearch, which has milliseconds granularity, and not timestamp in the original event. Do you also consider this auto generated @timestamp to be put in event.ingested? Or in event.created? |
@willemdh Correct. Out of these three, I personally see Are you collecting it? Do you see a lot of value in it?
I'm not sure I follow. What do you mean by "second granularity"? |
Actually I do use With second granularity I mean that the log source is sometimes not able to log the timestamp with milliseconds, for example:
When that is the case with high speed logging sources, I need to use the @timestamp created at ingestion by elasticsearch, otherwise all events within a second have the same @timestamp
So where do I put the I can't put it in @timestamp as I put the auto generated timestamp (with millisec granularity) there. But I still need it, so I created Also what if I am able to use the raw log timestamp, but I'd like to keep the auto generated @timestamp? For example for debugging what's the latency between the timestamp the log was created and the timestamp it was finally inserted into es.. When playing around with logstash workers for high speed logging sources (such as a perimeter Palo Alto fw), that can be quite useful. Sorry if I was unclear. Imho there are even more usable timestamps and we should have a place for all of them.
Example:
In the above case
Imho having an ECS timestamp for every piece in the ingestion process, would clear up confusion for everyone. Grtz |
@willemdh Thanks for the details!
Makes sense. I think that's exactly where
I can see how it can be useful to know when each event was at each step of the ingest pipeline. I think it would be hard to define these so they work well in most cases - many ingestion pipelines have many steps, e.g. In the "standard" case, I think |
Adding event.ingested is definitely needed, please continue with this pr. :) But I still believe there should be an official syslog timestamp field. It's the only missing field to match all syslog related info now we finally have cleared up the syslog priority / facility / severity. So something like log.syslog.timestamp I'd make a new GitHub issue for that, but if noone else cares then maybe I shouldn't.. 😃 |
In this scenario, |
@jordansissel Not necessarily. As event.created is described as the time when the event was first read by an agent or by a pipeline. Some beats modules add the field, but I'm not adding that in my Logstash pipeline with 'now', so the ingest timestamp is my best option. @timestamp would be equal to event.ingested (if it existed yet)... |
@webmat The PR to fill |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @cwurm, as usual thanks for the detailed thoughts @willemdh :-) An additional issue to discuss some more about original timestamp granularity is welcome, thanks for bringing that to our attention.
Adding event.ingested is definitely needed, please continue with this pr
Agreed, I don't think there's any outstanding issues with the PR, other than my comment below.
In a surprising turn of events, Christoph will be out for a little bit, and he asked me to take over. So I'm providing feedback on a PR that I will finish up myself 😂
I'll see if I have access to push to Christoph's branch, I think not. I may have to start from this PR and create a new one.
schemas/event.yml
Outdated
short: Ingest timestamp | ||
description: > | ||
Time when the event was ingested. This is different from `@timestamp` | ||
which is when the event originally occurred. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be a little more explicit in the relationship with event.created
as well here.
Here's what I'm thinking:
short: Timestamp when an event arrived in the central data store
description: >
Timestamp when an event arrived in the central data store.
This is different from `@timestamp`, which is when the event originally occurred.
It's also different from `event.created`, which is meant to capture the first time an agent saw the event.
In normal conditions, assuming no tampering, the timestamps should chronologically look like this: `@timestamp` < `event.created` < `event.ingested`.
Just like @timestamp
and event.created
, the description is pretty verbose in order to clarify the relationship. I think we should fast-track this PR and we may not necessarily need to adjust @timestamp
and event.created
descriptions for now.
@andrewkroh With Christoph out, can I elect you to double-check me on finishing up his PR? I think this is ready to merge if there's nothing major sticking out. This relates to the Beats PR elastic/beats#14001, so cc @tsg as a reviewer on the Beats PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -276,3 +277,18 @@ | |||
|
|||
This is mainly useful if you use more than one system that assigns | |||
risk scores, and you want to see a normalized value across all systems. | |||
|
|||
- name: ingested | |||
level: core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to introduce it directly as core. From the feedback we've seen, this timestamp is considered more useful than event.created
.
A few reasons why that is so:
- It's another system's timestamp, which can help detect tampering of the clock on the monitored machine
- It can also be used to detect slowdowns in the overall pipeline, assuming no tampering
PRs such as elastic/beats#14001 could also help populating it broadly and reliably, without having to revisit all modules or all beats.
If you have strong feelings and would really prefer to start by introducing as extended, I can go with that, in order to get this in quickly. But I think it would send the wrong message wrt to this timestamp's importance vs event.created
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll wait for your response on this, and would like to merge this tomorrow if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
Especially in a security use case, the ingest timestamp is important. Unlike
@timestamp
which contains the timestamp when the event originally occurred (e.g. a process was started),event.ingested
is to contain the timestamp when an event arrived in the central data store (usually Elasticsearch).This is an important timestamp to have since event and ingest can be far apart for various reasons:
Having the ingest timestamp in addition to the event timestamp allows making sure that all data is processed as it arrives into Elasticsearch (e.g. by a scheduled query on the ingest timestamp).
It will also allow to run analysis on the relation between event and ingest timestamp, e.g. find events where the event timestamp is after the ingest timestamp, or significantly before it. This is useful for both security purposes (e.g. to find attackers manipulating system time) as well as operational (e.g. there might be a problem in the ingest pipeline, an endpoint might be misconfigured, or an NTP server is having a bad day).
The easiest way to fill the ingest timestamp is to use an ingest processor in Elasticsearch like this (docs):
This PR is adding
event.ingested
as acore
field - the expectation being that it should be filled by all data sources (esp. security/auditing data sources) so it's possible to run any queries depending on seeing all data on it (esp. infosec queries).