Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to overwrite @timestamp with different format #11273

Closed
mobidyc opened this issue Mar 15, 2019 · 17 comments
Closed

Allow to overwrite @timestamp with different format #11273

mobidyc opened this issue Mar 15, 2019 · 17 comments
Labels
needs_team Indicates that the issue/PR needs a Team:* label Stalled

Comments

@mobidyc
Copy link

mobidyc commented Mar 15, 2019

For confirmed bugs, please report:

using filebeat to parse log lines like this one:

{"host":"s3-ssl-conn-0.localdomain","service":"sfused","instance":"unconfigured","pid":31737,"trace_type":"op","trace_id":7257574788016540,"span_id":1918366419434151,"parent_span_id":4632228147107467,"@timestamp":"2019-03-15T19:41:07.282853+0000","start_time":1552678867282.853,"end_time":1552678867283.062,"duration_ms":0.208984,"op":"service","layer":"workers_arc_sub","error":false,"cancelled":false,"tid":32534}

returns error as you can see in the following filebeat log:

2019-03-15T19:41:11.564Z        ERROR   jsontransform/jsonhelper.go:53  JSON: Won't overwrite @timestamp because of parsing error: parsing time "2019-03-15T19:41:07.175876+0000" as "2006-01-02T15:04:05Z07:00": cannot parse "+0000" as "Z07:00"

I use a template file where I define that the @timestamp field is a date:

{
    "mappings": {
        "doc": {
            "properties": {
                "layer": {
                    "type": "keyword"
                }, 
                "ip_addr": {
                    "type": "ip"
                }, 
                "string": {
                    "type": "text"
                }, 
                "service": {
                    "type": "keyword"
                }, 
                "@timestamp": {
                    "type": "date"
                }, 
                "parent_span_id": {
                    "index": "false", 
                    "type": "long"
                }, 
                "trace_type": {
                    "type": "keyword"
                }, 
                "trace_id": {
                    "type": "long"
                }, 
                "label": {
                    "type": "keyword"
                }, 
                "ip_port": {
                    "type": "long"
                }, 
                "instance": {
                    "type": "keyword"
                }, 
                "host": {
                    "type": "keyword"
                }, 
                "num": {
                    "type": "keyword"
                }, 
                "end_time": {
                    "type": "double"
                }, 
                "key": {
                    "type": "keyword"
                }, 
                "error": {
                    "type": "boolean"
                }, 
                "cancelled": {
                    "type": "boolean"
                }, 
                "path": {
                    "type": "text"
                }, 
                "span_id": {
                    "index": "false", 
                    "type": "long"
                }, 
                "start_time": {
                    "type": "double"
                }, 
                "op": {
                    "type": "keyword"
                }
            }
        }
    }, 
    "template": "app-traces-*", 
    "settings": {
        "index.refresh_interval": "30s"
    }
}
@ruflin
Copy link
Contributor

ruflin commented Mar 18, 2019

I would think using format for the date field should solve this? https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html

Closing this for now as I don't think it's a bug in Beats.

@ruflin ruflin closed this as completed Mar 18, 2019
@mobidyc
Copy link
Author

mobidyc commented Mar 18, 2019

It does not work as it seems not possible to overwrite the date format.
see https://discuss.elastic.co/t/cannot-change-date-format-on-timestamp/172638

@ruflin
Copy link
Contributor

ruflin commented Mar 19, 2019

I now see that you try to overwrite the existing timestamp. We should probably rename this issue to "Allow to overwrite @timestamp with different format" or something similar.

As a work around, is it possible that you name it differently in your json log file and then use an ingest pipeline to remove the original timestamp (we often call it event.created) and move your timestamp to @timestamp. That is what we do in quite a few modules.

@ruflin ruflin reopened this Mar 19, 2019
@mobidyc
Copy link
Author

mobidyc commented Mar 20, 2019

Hello,

Unfortunately no, it is not possible to change the code of the distributed sytem which populate the log files.
and it is even not possible to change the tools which use the elasticsearch datas as I do not control them (so renaming is not possible).
the log harvester has to grab the log lines and send it in the desired format to elasticsearch.
(I have the same problem with a "host" field in the log lines.
it is a regression as it worked very well in filebeat 5.x but I understand that the issue comes from elasticsearch and the mapping types.

right now, I am looking to write my own log parser and send datas directly to elasticsearch (I don't want to use logstash for numerous reasons) so I have one request,
could you write somewhere in the documentation the reserved field names we cannot overwrite (like @timestamp format, host field, etc..)?
It could save a lot of time to people trying to do something not possible.

@mobidyc
Copy link
Author

mobidyc commented Mar 20, 2019

additionally, pipelining ingestion is too ressource consuming,
I've too much datas and the time processing introduces too much latency for the treatment of the millions of log lines the application produces.

@mobidyc mobidyc changed the title Incompatible timestamp format Allow to overwrite @timestamp with different format Mar 20, 2019
@ruflin
Copy link
Contributor

ruflin commented Mar 22, 2019

With 7.0 we are switching to ECS, this should mostly solve the problem around conflicts: https://github.com/elastic/ecs Unfortunately there will always a chance for conflicts. If you use foo today and we will start using foo.bar in the future, there will be a conflict for you.

What I don't fully understand is if you can deploy your own log shipper to a machine, why can't you change the filebeat config there to use rename?

I'm curious to hear more on why using simple pipelines is too resource consuming. Did you run some comparisons here?

@takeseem
Copy link

I have the same problem.
I feel elasticers have a little arrogance on the problem.

@andrewkroh
Copy link
Member

We have added a timestamp processor that could help with this issue. You can tell it what field to parse as a date and it will set the @timestamp value.

It doesn't directly help when you're parsing JSON containing @timestamp with Filebeat and trying to write the resulting field into the root of the document. But you could work-around that by not writing into the root of the document, apply the timestamp processor, and the moving some fields around.

@morganchristiansson
Copy link

morganchristiansson commented Oct 30, 2019

@sselberg
Copy link

This is caused by the fact that the "time" package that beats is using [1] to parse @timestamp from JSON doesn't honor the RFC3339 spec [2], (specifically the part that says that both "+dd:dd" AND "+dddd" are valid timezones)
So some timestamps that follow RFC3339 (like the one above) will cause a parse failure when parsed with:
ts, err := time.Parse(time.RFC3339, vstr)

[1]

ts, err := time.Parse(time.RFC3339, vstr)

[2] golang/go#31113

@sselberg
Copy link

This is caused by the fact that the "time" package that beats is using [1] to parse @timestamp from JSON doesn't honor the RFC3339 spec [2], (specifically the part that says that both "+dd:dd" AND "+dddd" are valid timezones)
So some timestamps that follow RFC3339 (like the one above) will cause a parse failure when parsed with:
ts, err := time.Parse(time.RFC3339, vstr)

[1]

ts, err := time.Parse(time.RFC3339, vstr)

[2] golang/go#31113

Seems like I read the RFC3339 spec to hastily and the part where ":" is optional was from the Appendix that describes ISO8601.

@mattiagalati
Copy link

We have added a timestamp processor that could help with this issue. You can tell it what field to parse as a date and it will set the @timestamp value.

It doesn't directly help when you're parsing JSON containing @timestamp with Filebeat and trying to write the resulting field into the root of the document. But you could work-around that by not writing into the root of the document, apply the timestamp processor, and the moving some fields around.

Could be possible to have an hint about how to do that? Seems like Filebeat prevent "@timestamp" field renaming if used with json.keys_under_root: true.

In my company we would like to switch from logstash to filebeat and already have tons of logs with a custom timestamp that Logstash manages without complaying about the timestamp, the same format that causes troubles in Filebeat.

@morganchristiansson
Copy link

morganchristiansson commented Oct 26, 2020

You can disable JSON decoding in filebeat and do it in the next stage (logstash or elasticsearch ingest processors).

@mattiagalati
Copy link

You can disable JSON decoding in filebeat and do it in the next stage (logstash or elasticsearch ingest processors).

Seems like a bit odd to have a poweful tool like Filebeat and discover it cannot replace the timestamp. I mean: storing the timestamp itself in the log row is the simplest solution to ensure the event keep it's consistency even if my filebeat suddenly stops or elastic is unreachable; plus, using a JSON string as log row is one of the most common pattern today.
I wonder why no one in Elastic took care of it.

Using an ingest urges me to learn and add another layer to my elastic stack, and imho is a ridiculous tradeoff only to accomplish a simple task.

For now, I just forked the beats source code to parse my custom format.

@gavin-orange
Copy link

filebeat.inputs:
- type: log
  paths:
#    - /usr/local/CCS-20201207161336.txt
#    - /usr/local/testlog4.log
#    - /usr/local/access.log
  - /root/json/json.log
#  json.keys_under_root: true
  json.overwrite_keys: true
  json.add_error_keys: true
#output.logstash:
#  hosts: ["0.0.0.0:5044"]

setup.kibana:
  hosts: "10.239.113.77:5601"

setup.ilm.enabled: false
setup.template.overwrite: true
setup.template.name: "nginx"
setup.template.pattern: "nginx-*"

output.elasticsearch:
  hosts: ["10.239.113.77:9200"]
  index: "nginx-sz-%{+yyyy-MM}"

processors: 
  - add_locale: ~
  - timestamp:
      field: json.@timestamp
      layouts:
        - '08/Jan/2021:13:38:30 Z'
        
···

Not working ...

@botelastic
Copy link

botelastic bot commented Dec 13, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@botelastic botelastic bot added Stalled needs_team Indicates that the issue/PR needs a Team:* label labels Dec 13, 2021
@botelastic
Copy link

botelastic bot commented Dec 13, 2021

This issue doesn't have a Team:<team> label.

@botelastic botelastic bot closed this as completed Jan 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs_team Indicates that the issue/PR needs a Team:* label Stalled
Projects
None yet
Development

No branches or pull requests

8 participants