-
Notifications
You must be signed in to change notification settings - Fork 461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out of the box ECS field mappings for Custom Input packages #4236
Comments
Pinging @elastic/security-external-integrations (Team:Security-External Integrations) |
@jsoriano , @andrewkroh can you chime in here? |
This is a long standing issue. We should have a way to include agent-specific mappings into the index templates of any input, or integration in general. I don't think that these mappings belong to packages, as they are not produced by them. If these mappings are included in packages, any update on the agent or the processes it manages would require to update and release all packages, and users would need to update them. At some scale this is barely possible. But at this moment this is the only option we have. In elastic/package-spec#63 and elastic/package-spec#199 something like an "Agent Common Schema" is proposed. This would be an schema including all common fields that an agent can generate, and Fleet would install it along with the mappings included in a given package. |
@jsoriano I think that one of the issues is that filebeat itself would install these fields when using raw inputs if the processors was used, and it would be nice to be able to at least include the minimum fields. Isn't there something we could do in the meantime? Or would that just cause more issues later down the line? It's either that or we should disable the |
For as long as Agent is automatically including processors in config without the option of disabling them, then I think our input packages must generate mappings that include the fields produced by these processors. In effect that means including field definitions for fields produced by Longer term my preference for Agent is to never enable processors by default. They should always be opt-in whether that is by the integration developer (e.g. a conscious decision to always add specific I think we should solve the issues relating to management and maintenance of mappings for inputs and processors. This will make it easier to scale the number of integrations we maintain. |
Yeah, I agree that this is the only solution at the moment. I am not sure though if we should do much to support this, as we don't want it long term. The way to do this now is to copy and paste manually. Perhaps a way to support this mid-term is to implement in elastic-package some kind of import mechanism as the one we have for ECS fields, but that include whole sets of fields. We would need to have the fields definitions of these processors somewhere, this could be the "Agent Common Schema" that has appeared in previous discussions, and would be also useful if later on we make the use of these processors an opt-in feature.
Agree, but I think that this is not a decision to make by the integration developer. I don't see why a service integration may want some of these processors while others don't. I think that this is or a product decision (as is now), or a user decision, who chooses what metadata to add depending on their necessities and deployments.
+1, this has to be out of packages development process.
So maybe a plan is:
@andrewkroh wdyt?
Duplication of fields in data streams should be already detected when using |
@jsoriano Overall I like this plan. For the short-term, I would say we should add mappings manually to these custom input packages for the fields that need non-default mappings (e.g host.ip, cloud.account.id (prevent it from being detected as a number)). This way we can address the field conflicts that users are experiencing today. Mid-term, this sounds great to have sets of fields that can be imported. I would expect this will be useful in the long term as well because we could use it for fields associated with input types that are often reused (like import the fieldset for the "tcp" input or import the fieldset associated to the Long-term, I like the idea of giving the control the user and putting Fleet in charge of the mapping. I can think of some things that make this complicated to manage, but I like the direction.
I meant duplication in the sense that we are cloning fields.yml files between integrations in order to "import" the set of fields that are associated to agent inputs and processors. Not about the same field being declared more than once. That detection is working. |
Do you have a list of such mappings? If there are few of them maybe we can hard-code them by now in elastic-package (or Fleet) if this is a low hanging fix for current issues.
Ah ok, this would be solved by the proposed plan 👍 I will create the follow-up tasks to implement this. |
Great progress! Thanks a lot for the nice feedback @jsoriano @andrewkroh . I have 2 new input integrations planned, which I will put slightly on hold until it is resolved, so that we don't have any unnecessary changes so close after release. Let me know if there is anywhere that I can help 👍 |
It'd be great if you could help compiling the list of fields that would be good to include in the hard-coded workaround in elastic/elastic-package#1018. Thanks! |
Will put that on the todo list for tomorrow then 👍 |
related |
@felixbarny Do you think this issue is solved by the ECS enhancements to the default |
I think this issue should be solved by that. It seems the missing piece would be to either remove the index templates for input packages (which we probably don't want due to the |
Nice, so we can hope that in 8.11-12 we'll finally have this |
This is already in Elasticsearch 8.9: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/core/template-resources/src/main/resources/ecs-dynamic-mappings.json |
I'm on 8.10.3 and currently got some mapping conflicts with type long:
In the proposed dynamic mapping, I do not see any mapping for type |
The dynamic ECS mapping is not explicitly mapping fields that have a default mapping, such as number -> long, string -> keyword. This allows the mapping not be very compact and lightweight, was well as not needing a change every time a new field is added. The tradeoff is that this doesn't guard against documents that come in with the wrong type. For example, a document containing
How did the conflicts come to be? Was that a case of malformed data being shipped? What was the cause for that? |
I started using the new getting back to your example for the from my point of view I could/would add them to the dynamic mapping
When I asked:
I was asking if this will be by default for all custom integrations available. Not needing me to do any import. |
@kpollich Anything from your side? |
Currently the custom input packages (like TCP/UDP, httpjson etc) comes with the bare minimum of ECS mapping, very similar to how custom inputs worked in Filebeat, however this does not produce the best outcome for the end users, as functionality like
add_*_metadata
for example produces ECS fields, especiallyhost
which is enabled by default.This issue is to discuss what the best practice would be for all Custom Input packages, and then used to track the status of applying any decided changes.
Packages:
The text was updated successfully, but these errors were encountered: