Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filebeat: Add failing tests for @timestamps in JSON with timezone specified #1831

Conversation

morganchristiansson
Copy link

filebeat is currently generating errors for timestamps with timezone specified as +00:00 ('Z' is allowed without suffix). Failing tests added.

This is the default @timestamp format used by kibana json log for example.

Seems to have been introduced here: https://github.com/elastic/beats/pull/1421/files#r60245935

This is the default @timestamp format used by kibana json log for example
@elasticsearch-release
Copy link

Jenkins standing by to test this. If you aren't a maintainer, you can ignore this comment. Someone with commit access, please review this and clear it for Jenkins to run; then say 'jenkins, test it'.

1 similar comment
@elasticsearch-release
Copy link

Jenkins standing by to test this. If you aren't a maintainer, you can ignore this comment. Someone with commit access, please review this and clear it for Jenkins to run; then say 'jenkins, test it'.

@ruflin
Copy link
Contributor

ruflin commented Jun 13, 2016

@morganchristiansson Thanks a lot for all the details. We will have a look to check what goes wrong here.

@tsg
Copy link
Contributor

tsg commented Jun 13, 2016

Filebeat currently doesn't support a configurable timestamp format, and the current format requires the Z at the end. What we could do to support this test case without adding configurable patterns is to switch to RFC3339 as the format. That accepts both formats: https://play.golang.org/p/boxSimIhq8

We could also just make it configurable.

@morganchristiansson do you need just this format or configurable formats?

@morganchristiansson
Copy link
Author

Hi.

I have no idea why it's suggested that filebeat should enforce the format of @timestamp when the rest of the stack seemingly supports ISO8601/RFC3339. Why do you even attempt to parse @timestamp other than to pass it to the output as a string? When I ask filebeat to parse the JSON I do not want it to mangle our @timestamp field.

JSON parsing was only recently added and you immediately jump to enforce the format of the @timestamp ?? WHY? This is already handled by both logstash and elasticsearch, it is not the log collectors job to try and parse it!

We switched from nxlog because it was converting @timestamp to a format that logstash/elasticsearch wouldn't recognize and also stripping precision from the @timestamp value and now filebeat is going down the same route!

Also the format you are using as example in the filebeat reference log output is RFC3339 which is not supported by filebeat. Much like kibanas timestamp format is not supported either.

filebeat: 2006-01-02T15:04:05Z07:00 <-- 07:00 is not supported. Only UTC!

kibana: 2016-06-10T16:29:37+00:00 <-- Z missing and +00:00 not supported.

https://www.elastic.co/guide/en/beats/filebeat/master/configuration-logging.html#_logging_format

If you're going to enforce the format you really should make sure you use your own format that all filebeat users should now be using. I shall do the same and reach out to the multiple teams that are now generating invalid timezone formats as only a subset of the ISO8660/RFC3336 is now supported.

Or should we start using an alternative field and write logstash configuration to rename it back just so that filebeat won't muck it up? All other apps including kibana seem to generate documents with (invalid) @timestamp field.

@ruflin
Copy link
Contributor

ruflin commented Jun 14, 2016

It's a very good point that we should be consistent across the stack or support at least the other formats. I think the first step we should do is to support both formats which seems to be quite a simple change.

The second question is if we should allow any kind of structure in @timestamp. If JSON is used, should we try to parse the @timestamp value if it is a string or just leave it as it is? All the beats and elasticsearch rely that @timestamp is a valid date field so I think it should be a supported timestamp. Making it configurable would also require to make the template potentially configurable so elsaticsearch understands the date format. This seems to add too much complexity I assume.

If someone does want to use his own @timestamp format this is possible, by not using KeysUnderRoot (never tested, but this should work).

@morganchristiansson
Copy link
Author

morganchristiansson commented Jun 15, 2016

Personally I need support for specifying timezones in UTC and UTC. The +00:00/+01:00 format of ISO8601/RFC3339 is required.

RFC3339 format for parsing suggested by @tsg sounds great. But I suspect other users will ask for other formats.

As for keys_under_root setting, it changes existing data and is not what I want, although with a logstash filter to move the values under the json key to root it would probably work great. Will consider this.

I'm curious what the reason is for filebeat to even parse the @timestamp and, and what it does with this value? Why not just pass on the value to the output as is?

@morganchristiansson
Copy link
Author

morganchristiansson commented Jun 16, 2016

My dislike for handling dates and times increases. Only somewhat related but thought I'd share. From the PHP manual:
ISO860 in PHP

Source: http://www.reddit.com/r/lolphp/comments/4oacnc/_/

@ruflin
Copy link
Contributor

ruflin commented Jun 16, 2016

@morganchristiansson The reason it is parse is mainly for historical reason. All other implementations actually generate the timestamp and we use internally a fixed type to handle this field. So in case it is not converted, we have to change larger parts in the code base. That doesn't mean it should not be changed, at the same time I feel like as this is almost the only field we required, we should be strict here for the type.

@morganchristiansson
Copy link
Author

It would be useful to record when the log was processed by filebeat in a secondary field like beat.timeatamp.

Parsing dates is a can of worms best avoided or delegated to libraries that specialise in it.

@ruflin
Copy link
Contributor

ruflin commented Jun 17, 2016

We first had that in timestamp and LS was then adding @timestamp which had become really confusing. Now that we have meta data under beat this could be worth discussing again. The problem is that in some cases the log lines are directly sent to elasticsearch and then the @timestamp would be missing.

@ruflin ruflin mentioned this pull request Jun 20, 2016
@ruflin
Copy link
Contributor

ruflin commented Jun 20, 2016

@morganchristiansson Could you have a look at this PR? #1889 I took your changes / tests and built on top the necessary changes.

@ruflin
Copy link
Contributor

ruflin commented Jun 20, 2016

@morganchristiansson BTW: Could you sign the CLA so I can take your code in my PR? https://www.elastic.co/contributor-agreement/

ruflin added a commit to ruflin/beats that referenced this pull request Jun 20, 2016
Allow timestamps without timezone. See elastic#1831 for more details
@andrewkroh andrewkroh added the needs CLA User must sign the Elastic Contributor License before review. label Jun 20, 2016
@andrewkroh
Copy link
Member

Incorporated into #1889

@andrewkroh andrewkroh closed this Jun 20, 2016
andrewkroh pushed a commit that referenced this pull request Jun 21, 2016
* Update testcases to test timestamps with timezone specified

This is the default @timestamp format used by kibana json log for example

* Make timestamp conversation based on RFC3339

Allow timestamps without timezone. See #1831 for more details
@morganchristiansson
Copy link
Author

morganchristiansson commented Jun 27, 2016

I've implemented this workaround/solution to fix the events in logstash after they've exited the filebeats pipeline.

filter {
  if [json][@timestamp] {
    date { match => ["[json][@timestamp]", "ISO8601"] }
  }
  if [json] {
    ruby {
      code => "event['json'].each { |k,v| event[k]=v unless k == '@timestamp' } ; event.remove('json')"
    }
  }
}

and then set filebeat.prospectors[].json.keys_under_root: false everywhere.

@ruflin
Copy link
Contributor

ruflin commented Jun 27, 2016

@morganchristiansson Thanks for sharing. I'm not 100% sure in which cases this is needed?

@morganchristiansson
Copy link
Author

morganchristiansson commented Jun 29, 2016

It's a workaround waiting for filebeat-5.0.0-alpha4 release and also handling any ISO8601 dates that aren't RFC3339 dates, such as timezones with missing colon (+0000)

I also get the behaviour I initially expected from filebeat with this fix/workaround.

@ruflin
Copy link
Contributor

ruflin commented Jul 4, 2016

@morganchristiansson Got it. Thanks for the details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Filebeat Filebeat needs CLA User must sign the Elastic Contributor License before review. Pioneer Program
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants