Out of the box ECS field mappings for Custom Input packages #4236

P1llus · 2022-09-20T07:26:34Z

elasticmachine · 2022-09-20T07:26:35Z

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

narph · 2022-10-18T13:36:31Z

@jsoriano , @andrewkroh can you chime in here?

jsoriano · 2022-10-18T13:51:42Z

This is a long standing issue. We should have a way to include agent-specific mappings into the index templates of any input, or integration in general.

I don't think that these mappings belong to packages, as they are not produced by them. If these mappings are included in packages, any update on the agent or the processes it manages would require to update and release all packages, and users would need to update them. At some scale this is barely possible. But at this moment this is the only option we have.

In elastic/package-spec#63 and elastic/package-spec#199 something like an "Agent Common Schema" is proposed. This would be an schema including all common fields that an agent can generate, and Fleet would install it along with the mappings included in a given package.

P1llus · 2022-10-19T11:14:26Z

@jsoriano I think that one of the issues is that filebeat itself would install these fields when using raw inputs if the processors was used, and it would be nice to be able to at least include the minimum fields.

Isn't there something we could do in the meantime? Or would that just cause more issues later down the line? It's either that or we should disable the add_host_metadata processor by default until we have it resolved, WDYT?

andrewkroh · 2022-10-24T23:00:38Z

For as long as Agent is automatically including processors in config without the option of disabling them, then I think our input packages must generate mappings that include the fields produced by these processors. In effect that means including field definitions for fields produced by add_{host,cloud,kubernetes,docker}_metadata to all input packages. I think this needs to happen now even if we don't have solutions for avoiding duplication of the field definitions (these lists of fields are already copy/pasted across nearly every logging integration).

Longer term my preference for Agent is to never enable processors by default. They should always be opt-in whether that is by the integration developer (e.g. a conscious decision to always add specific processors into agent input config template) or by the end-user (e.g. enabling specific processors at integration config time via toggles exposed in the UI or via direct YAML specification). This would simplify the thinking around input configs and and put the developers and users in control of the data. Today you always have to account for these four magic processors that are always enabled.

I think we should solve the issues relating to management and maintenance of mappings for inputs and processors. This will make it easier to scale the number of integrations we maintain.

jsoriano · 2022-10-25T10:24:53Z

For as long as Agent is automatically including processors in config without the option of disabling them, then I think our input packages must generate mappings that include the fields produced by these processors. In effect that means including field definitions for fields produced by add_{host,cloud,kubernetes,docker}_metadata to all input packages. I think this needs to happen now even if we don't have solutions for avoiding duplication of the field definitions (these lists of fields are already copy/pasted across nearly every logging integration).

Yeah, I agree that this is the only solution at the moment. I am not sure though if we should do much to support this, as we don't want it long term. The way to do this now is to copy and paste manually.

Perhaps a way to support this mid-term is to implement in elastic-package some kind of import mechanism as the one we have for ECS fields, but that include whole sets of fields. We would need to have the fields definitions of these processors somewhere, this could be the "Agent Common Schema" that has appeared in previous discussions, and would be also useful if later on we make the use of these processors an opt-in feature.

Longer term my preference for Agent is to never enable processors by default. They should always be opt-in whether that is by the integration developer (e.g. a conscious decision to always add specific processors into agent input config template) or by the end-user (e.g. enabling specific processors at integration config time via toggles exposed in the UI or via direct YAML specification).

Agree, but I think that this is not a decision to make by the integration developer. I don't see why a service integration may want some of these processors while others don't. I think that this is or a product decision (as is now), or a user decision, who chooses what metadata to add depending on their necessities and deployments.

This would simplify the thinking around input configs and and put the developers and users in control of the data. Today you always have to account for these four magic processors that are always enabled.

+1, this has to be out of packages development process.

I think we should solve the issues relating to management and maintenance of mappings for inputs and processors. This will make it easier to scale the number of integrations we maintain.

So maybe a plan is:

Define something like the "Agent Common Schema" as the source of truth for these mappings. I think we need this in any case.
Short/mid-term: implement something in elastic-package to import fields from this schema on build time, this could be initially hard-coded to the current list of included processors, so package developers "only" need to remove duplicated definitions.
Longer-term: users are able to select the processors they use from Fleet, mappings are installed by Fleet, and we stop shipping them in packages by default.

@andrewkroh wdyt?

we don't have solutions for avoiding duplication of the field definitions

Duplication of fields in data streams should be already detected when using format_version: 2.0.0, please let us know if this is not working.

andrewkroh · 2022-10-25T20:39:46Z

@jsoriano Overall I like this plan.

For the short-term, I would say we should add mappings manually to these custom input packages for the fields that need non-default mappings (e.g host.ip, cloud.account.id (prevent it from being detected as a number)). This way we can address the field conflicts that users are experiencing today.

Mid-term, this sounds great to have sets of fields that can be imported. I would expect this will be useful in the long term as well because we could use it for fields associated with input types that are often reused (like import the fieldset for the "tcp" input or import the fieldset associated to the syslog beat processor).

Long-term, I like the idea of giving the control the user and putting Fleet in charge of the mapping. I can think of some things that make this complicated to manage, but I like the direction.

Duplication of fields in data streams should be already detected when using format_version: 2.0.0, please let us know if this is not working.

I meant duplication in the sense that we are cloning fields.yml files between integrations in order to "import" the set of fields that are associated to agent inputs and processors. Not about the same field being declared more than once. That detection is working.

jsoriano · 2022-10-26T09:03:33Z

For the short-term, I would say we should add mappings manually to these custom input packages for the fields that need non-default mappings (e.g host.ip, cloud.account.id (prevent it from being detected as a number)). This way we can address the field conflicts that users are experiencing today.

Do you have a list of such mappings? If there are few of them maybe we can hard-code them by now in elastic-package (or Fleet) if this is a low hanging fix for current issues.

I meant duplication in the sense that we are cloning fields.yml files between integrations in order to "import" the set of fields that are associated to agent inputs and processors.

Ah ok, this would be solved by the proposed plan 👍

I will create the follow-up tasks to implement this.

jsoriano · 2022-10-26T16:01:37Z

Issues created to implement the above plan:

P1llus · 2022-10-26T16:09:03Z

Great progress! Thanks a lot for the nice feedback @jsoriano @andrewkroh .

I have 2 new input integrations planned, which I will put slightly on hold until it is resolved, so that we don't have any unnecessary changes so close after release.

Let me know if there is anywhere that I can help 👍

jsoriano · 2022-10-26T16:20:49Z

Let me know if there is anywhere that I can help +1

It'd be great if you could help compiling the list of fields that would be good to include in the hard-coded workaround in elastic/elastic-package#1018.

Thanks!

P1llus · 2022-10-26T16:30:13Z

Will put that on the todo list for tomorrow then 👍

zez3 · 2023-09-20T09:36:15Z

#3151

joshdover · 2023-10-12T14:49:41Z

@felixbarny Do you think this issue is solved by the ECS enhancements to the default logs-* template?

felixbarny · 2023-10-13T09:10:30Z

I think this issue should be solved by that. It seems the missing piece would be to either remove the index templates for input packages (which we probably don't want due to the <package>@custom components and pipelines) or to import the ecs template. The latter should be doable but it means to require a minimum ES version and the ecs template will be versioned with ES, not with the input packages. I don't see an issue with that, however.

zez3 · 2023-10-13T17:18:07Z

should be doable but it means to require a minimum ES version and the ecs template will be versioned with ES, not with the input packages. I don't see an issue with that, however.

Nice, so we can hope that in 8.11-12 we'll finally have this
@P1llus ?

felixbarny · 2023-10-13T17:23:22Z

This is already in Elasticsearch 8.9: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/core/template-resources/src/main/resources/ecs-dynamic-mappings.json

zez3 · 2023-10-15T10:59:51Z

I'm on 8.10.3 and currently got some mapping conflicts with type long:

network.packets
network.bytes
http.response.status_code
server.port
client.port

In the proposed dynamic mapping, I do not see any mapping for type long

felixbarny · 2023-10-16T07:02:09Z

The dynamic ECS mapping is not explicitly mapping fields that have a default mapping, such as number -> long, string -> keyword. This allows the mapping not be very compact and lightweight, was well as not needing a change every time a new field is added.

The tradeoff is that this doesn't guard against documents that come in with the wrong type. For example, a document containing network.bytes: true could skillset up an incorrect mapping. This is somewhat, although not fully, mitigated by ignore_malformed=true.

currently got some mapping conflicts with type long

How did the conflicts come to be? Was that a case of malformed data being shipped? What was the cause for that?

zez3 · 2023-10-17T07:44:19Z

How did the conflicts come to be? Was that a case of malformed data being shipped? What was the cause for that?

I started using the new Custom Filestream Logs integration and after building the parser for my custom logs I forgot to add the fields to the logs-filestream.generic@custom component template. This is how I was doing before.
I mean for the other/older custom integrations like TCP UDP http endpoint journald where I also had conflicts in the past.
Now Ideally, I should have a global ...@custom component template and use that, per default (for all new upcoming custom integration), for all of my custom integrations.
@ruflin got some discussion going in the past if I remember correctly

getting back to your example network.bytes: true
If during the parser building phase I see such fields which are specific(long) already defined in ECS https://www.elastic.co/guide/en/ecs/current/ecs-network.html I would rename them to something else. Like network.bytes.enabled
I had such cases in the past, I will probably have in the future.

for the
http.response.status_code
server.port
client.port

from my point of view I could/would add them to the dynamic mapping

*.port and http.*.status_code will be 99% of the time a long number type

When I asked:

Nice, so we can hope that in 8.11-12 we'll finally have this

I was asking if this will be by default for all custom integrations available. Not needing me to do any import.

zez3 · 2023-10-17T08:44:21Z

@kpollich Anything from your side?

P1llus added discuss Team:Security-External Integrations Team:Service-Integrations Label for the Service Integrations team labels Sep 20, 2022

P1llus changed the title ~~Out of the box mappings for Custom Input packages~~ Out of the box ECS field mappings for Custom Input packages Sep 20, 2022

jsoriano mentioned this issue Oct 26, 2022

[Change Proposal] Introduce an "Agent Common Schema" elastic/package-spec#441

Open

jsoriano mentioned this issue Oct 26, 2022

Temporarily import hard-coded list of common fields elastic/elastic-package#1018

Closed

3 tasks

zez3 mentioned this issue Oct 12, 2023

[Bug] Elastic Agent Metricbeat Data Stream incorrectly maps network field as a keyword rather than an object #1572

Open

zez3 mentioned this issue Oct 19, 2023

Upgrading component template logs-settings failed after update to 8.9 elastic/elasticsearch#98247

Open

narph removed the Team:Security-External Integrations label Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of the box ECS field mappings for Custom Input packages #4236

Out of the box ECS field mappings for Custom Input packages #4236

P1llus commented Sep 20, 2022

elasticmachine commented Sep 20, 2022

narph commented Oct 18, 2022 •

edited

Loading

jsoriano commented Oct 18, 2022

P1llus commented Oct 19, 2022

andrewkroh commented Oct 24, 2022

jsoriano commented Oct 25, 2022

andrewkroh commented Oct 25, 2022

jsoriano commented Oct 26, 2022

jsoriano commented Oct 26, 2022

P1llus commented Oct 26, 2022

jsoriano commented Oct 26, 2022

P1llus commented Oct 26, 2022

zez3 commented Sep 20, 2023

joshdover commented Oct 12, 2023

felixbarny commented Oct 13, 2023

zez3 commented Oct 13, 2023

felixbarny commented Oct 13, 2023

zez3 commented Oct 15, 2023 •

edited

Loading

felixbarny commented Oct 16, 2023

zez3 commented Oct 17, 2023 •

edited

Loading

zez3 commented Oct 17, 2023

Out of the box ECS field mappings for Custom Input packages #4236

Out of the box ECS field mappings for Custom Input packages #4236

Comments

P1llus commented Sep 20, 2022

elasticmachine commented Sep 20, 2022

narph commented Oct 18, 2022 • edited Loading

jsoriano commented Oct 18, 2022

P1llus commented Oct 19, 2022

andrewkroh commented Oct 24, 2022

jsoriano commented Oct 25, 2022

andrewkroh commented Oct 25, 2022

jsoriano commented Oct 26, 2022

jsoriano commented Oct 26, 2022

P1llus commented Oct 26, 2022

jsoriano commented Oct 26, 2022

P1llus commented Oct 26, 2022

zez3 commented Sep 20, 2023

joshdover commented Oct 12, 2023

felixbarny commented Oct 13, 2023

zez3 commented Oct 13, 2023

felixbarny commented Oct 13, 2023

zez3 commented Oct 15, 2023 • edited Loading

felixbarny commented Oct 16, 2023

zez3 commented Oct 17, 2023 • edited Loading

zez3 commented Oct 17, 2023

narph commented Oct 18, 2022 •

edited

Loading

zez3 commented Oct 15, 2023 •

edited

Loading

zez3 commented Oct 17, 2023 •

edited

Loading