-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filebeat S3 input plugin cannot parse jsonl file with content-type set as application/json #19902
Comments
Pinging @elastic/integrations-platforms (Team:Platforms) |
Hi @lag13 sorry we missed the discuss issue! This does look like a bug to me so thank you for creating this issue. One question: What does AWS Cloudflare set for the |
That's fine! I'm glad you saw this one :). Cloudflare (and it's just cloudflare, not "AWS cloudflare" https://www.cloudflare.com/) sets the I'm curious as to why the code was changed in 7.7.0 to force json parsing if the Currently I'm using version 7.5.2 of filebeat to process things and that's working just fine. Thanks for getting back to me on this. Let me know if there's anything I can do to help. I'd be happy to attempt a contribution if you are swamped with other things. |
@lag13 I just created an initial PR for fixing this issue. Do you mind sending me a real log file from Cloudflare for testing? Please feel free to review/test it. Thanks! |
@kaiyan-sheng Thanks so much!!! I think what you have in your PR works just fine. The important bit is those separate json objects per line. That being said, the cloudflare logs looks like this:
|
I took a quick look at your PR. I'll try to test it out myself before tomorrow. |
Hey! I had created this topic two weeks ago https://discuss.elastic.co/t/filebeat-s3-cannot-parse-jsonl-file-whos-content-type-is-set-to-application-json/239374 but no one has responded and I am convinced that this is a bug or something that should be fixed so I thought I'd try making an issue. I appreciate any feedback and sorry if this is just extra noise for y'all.
The problem is that the filebeat S3 input plugin cannot process an s3 object who's
content-type
isapplication/json
AND the object content is a separate json object per line (i.e. jsonl). Processing such an object used to be possible until v7.7.0 when the S3 input plugin started enforcing json parsing if it saw acontent-type
ofapplication/json
: https://github.com/elastic/beats/blob/v7.7.0/x-pack/filebeat/input/s3/input.go#L432Before that, json processing was ONLY controlled by the
expand_event_list_from_field configuration
like you can see here: https://github.com/elastic/beats/blob/v7.6.2/x-pack/filebeat/input/s3/input.go#L431Now, it's probably the case that
content-type
on the s3 object should NOT beapplication/json
in the first place but sadly I do not have control over that 😦. I'm essentially dealing with the same problem as #18696 but cloudflare is the entity pushing the logs to s3 (where for him it is AWS GuardDuty) and I don't have control over how cloudflare setscontent-type
.I had reproduced this issue by compiling filebeat off of master two weeks ago (here was the filbeat version output
filebeat version 8.0.0 (amd64), libbeat 8.0.0 [3341c1bca5626d1ee90af692617f10f58695ed1c built 2020-06-30 20:15:28 +0000 UTC]
) and, looking at the source code today (2297636), it seems this problem is still there. Here were the steps I followed when I reproduced this issue:The imporant error is the last line which tells me that my object cannot be processed even though I think it should be processable:
s3/input.go:393 createEventsFromS3Info failed processing file from s3 bucket "lucas-test-filebeat-s3" with name "s3filebeat.log.gz": expand_event_list_from_field parameter is missing in config for application/json content-type file
.If I try, just for fun, to include the
expand_event_list_from_field
config it will, understandably, fail to parse and we'll get the WARN log:s3/input.go:542 decode json failed for 's3filebeat.log.gz' from S3 bucket 'lucas-test-filebeat-s3', skipping this file: json: cannot unmarshal string into Go value of type []interface {}
.Any advice on how to proceed here? Personally I think the appropriate change would be to ignore
content-type: application/json
and haveexpand_event_list_from_field
be the ONLY thing controlling whether or not to parse the object content as JSON (which is how the logic used to be) but I defer to you maintainers because you have the vision for how you want the code to behave.I really appreciate any who took the time to read this and I really apprecate the creation of this feature, it's a nifty one. Cheers.
The text was updated successfully, but these errors were encountered: