Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[filebeat] Add option to dedot keys to decode_json_fields processor #26154

Open
OranShuster opened this issue Jun 6, 2021 · 11 comments
Open
Labels
Stalled Team:Elastic-Agent Label for the Agent team Team:Integrations Label for the Integrations team

Comments

@OranShuster
Copy link

Describe the enhancement:
We currently use filebeat to send k8s audit logs to ES
A k8s audit log is essentially a very big JSON with information regarding the action made and the response returned
Since it contains the response it also contains labels and annotation which for k8s usually contains dots
So for a case where a pod has 2 labels: app and app.service we would get a text-object mapping conflict

While theoretically it's possible to have a mutate filter to fix this issue the labels and annotation name and/or location can change quite frequently so that solution becomes problematic

Describe a specific use case for the enhancement or feature:
For a log such as this
{ "kind": "Event", "apiVersion": "audit.k8s.io/v1", "level": "Request", "auditID": "0a840f68-077e-48d0-a4b8-225da9d696d2", "stage": "ResponseStarted", "requestURI": "/api/v1/nodes/ip-10-8-25-213.us-west-2.compute.internal/proxy/metrics/cadvisor", "verb": "get", "user": { "username": "system:serviceaccount:monitoring:prometheus", "uid": "e74966e4-14ed-41f4-a778-f7ed50a296a0", "groups": [ "system:serviceaccounts", "system:serviceaccounts:monitoring", "system:authenticated" ] }, "sourceIPs": [ "XXXXXX" ], "userAgent": "Prometheus/2.26.0", "objectRef": { "resource": "nodes", "name": "XXXXXXXXXXX", "apiVersion": "v1", "subresource": "proxy" }, "responseStatus": { "metadata": {}, "code": 200 }, "requestReceivedTimestamp": "2021-06-06T13:20:36.598495Z", "stageTimestamp": "2021-06-06T13:20:36.758344Z", "labels": { "app": "app", "app.service": "service" } }

My suggestion is that the decode_json_fields processor will have a dedot keys configuration that, when going over any key, will replace dots with underscores before sending it to the output
So we would see in ES the fields labels.app and labels.app_service

@OranShuster OranShuster changed the title [filebeat] Add option to dedot keys to ecode_json_fields processor [filebeat] Add option to dedot keys to decode_json_fields processor Jun 6, 2021
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jun 6, 2021
@jsoriano
Copy link
Member

Hey @OranShuster,

I can think on two possible workarounds at the moment for this.

Use the script processor to dedot the object or do any other transformation to prevent these issues.

Store the result of decode_json_fields in a field of type flattened, that is intended for this very use case. You will need to modify the mapping of your indexes to leverage this.

@jsoriano jsoriano added Team:Elastic-Agent Label for the Agent team Team:Integrations Label for the Integrations team labels Jun 23, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/agent (Team:Agent)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jun 23, 2021
@OranShuster
Copy link
Author

Store the result of decode_json_fields in a field of type flattened, that is intended for this very use case. You will need to modify the mapping of your indexes to leverage this.

its not clear from the flattened docs how it will handle these types of inputs
for a document such as

{
"app":"appName",
"app.v1.label": "appLabel"
}

we will have 2 leaf nodes - app and label?
so in order to get the first field i will use doc['app'].value
but for the label field i will use doc['app.v1.label`].value? will that work?

i think i will go with the script processor route as it will give me more predictable results

Thanks for the reply

@jsoriano
Copy link
Member

its not clear from the flattened docs how it will handle these types of inputs

flattened allows to store objects with members with any name, including names with dots. But it doesn't work from the root document, it needs to be used as a named field.

for a document such as

{
"app":"appName",
"app.v1.label": "appLabel"
}

You cannot do that, but if you define for example a kubernetes_audit field as flattened, you can have this document:

{
"kubernetes_audit.app":"appName",
"kubernetes_audit.app.v1.label": "appLabel"
}

And then in principle you can access to these fields as doc['kubernetes_audit.app'] and doc['app.v1.label'].

Fields inside flattened objects support the same operations you would expect for keywords, and there are plans to support numeric operations in the future too.

@OranShuster
Copy link
Author

OranShuster commented Jun 23, 2021

And then in principle you can access to these fields as doc['kubernetes_audit.app'] and doc['app.v1.label'].

I guess you meant here doc['kubernetes_audit.app.v1.label'].
This clarifies things about how this field is processed.
While i don't think ill use it, its a good option to know

@botelastic
Copy link

botelastic bot commented Jun 23, 2022

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Jun 23, 2022
@ruflin
Copy link
Contributor

ruflin commented Jun 24, 2022

Here is an interesting new feature in Elasticsearch that could help with the dots: elastic/elasticsearch#86166

@botelastic botelastic bot removed the Stalled label Jun 24, 2022
@botelastic
Copy link

botelastic bot commented Oct 13, 2023

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Oct 13, 2023
@OranShuster
Copy link
Author

OranShuster commented Oct 13, 2023

👍

@botelastic botelastic bot removed the Stalled label Oct 13, 2023
@botelastic
Copy link

botelastic bot commented Oct 12, 2024

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Oct 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stalled Team:Elastic-Agent Label for the Agent team Team:Integrations Label for the Integrations team
Projects
None yet
Development

No branches or pull requests

4 participants