Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Dead letter queue #86170

Closed
ruflin opened this issue Apr 26, 2022 · 13 comments
Closed

[Feature] Dead letter queue #86170

ruflin opened this issue Apr 26, 2022 · 13 comments
Labels
:Data Management/Data streams Data streams and their lifecycles discuss Team:Data Management Meta label for data/management team

Comments

@ruflin
Copy link
Contributor

ruflin commented Apr 26, 2022

With Elastic Agent we are fully embracing data streams and the data stream naming scheme. In many scenarios, we control the ingestion data structure and the mappings put in place. But as we encourage everyone to use the data stream naming scheme and for example for logs-*-* we put a basic ECS template in place, it is possible that on ingest time it can come to conflict. Reasons might be because the field foo is an object but someone trying to ingest data sends foo as a keyword.

Currently, Elasticsearch just rejects the data with an error. Instead it would be nice to be able to configure a dead letter queue where these events end up in. This ensures not the client has to deal with mapping conflicts and ensures all data is ingested.

This dead letter queue could be generic or per data stream (up for discussion). An assumption I make is that this dead letter queue by default would not have any mappings specified and queries have to be run with runtime fields.

Users could look at the dead letter queue and use it to debug their ingest pipelines / mappings to the "reindex" part of the events in the dead letter queue.

@felixbarny felixbarny added the :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. label May 3, 2022
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label May 3, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner DaveCTurner added :Data Management/Data streams Data streams and their lifecycles and removed :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. labels May 3, 2022
@elasticmachine elasticmachine added Team:Data Management Meta label for data/management team and removed Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. labels May 3, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@DaveCTurner
Copy link
Contributor

I'm moving this over to the data management team for now. It's definitely on the border between the data-management and distrib areas but I think the data management folks are a better choice to think about this idea first.

@felixbarny
Copy link
Member

Thanks @DaveCTurner for the help in triaging this. Not sure if that has an impact on responsibilities within the ES team but it's currently an open question whether DLQs should be exclusive to data streams or if they should also apply to regular indices. @jsvd brought up some good arguments in favor of also adding DLQs to regular indices to facilitate non-time series use cases in Logstash and Enterprise search that would benefit from a DLQ.

@felixbarny
Copy link
Member

DLQs could also help with a lot of the use cases that are mentioned in

@threatangler-jp
Copy link

threatangler-jp commented Jul 13, 2022

This is great. We have achieved the same thing using a different method.

We set ignore_malformed to true on filebeat and elastic agent integration index templates. Some native index templates will not accept the ignore_malformed true setting though and so that is a blind spot.

We also populate error.message anytime a pipeline processor fails. We set ignore failure to enabled on all processors. And we occasionally enable logging on our agents (this is too costly to do all the time). We then have a quality assurance process that searches a * index pattern for the below string:

(message : mapper_parsing_exception) OR (error.message : *) OR (_ignored : *) OR (message : dropping)

We are reactively catching and then able to fix these issues (those we are not blind to at least). We have been shocked though to see the volume of these issues coming from native index template settings in new filebeat module and agent integrations. So, we are asking the question - is the root issue a lack of discipline in alignment with ECS when Elastic is building new modules and integrations? A minority of the issues are not ECS alignment related but are field char limitation related but the fields this is happening to are easily identifiable as a field that would need a larger char limit.

@felixbarny
Copy link
Member

A minority of the issues are not ECS alignment related but are field char limitation related but the fields this is happening to are easily identifiable as a field that would need a larger char limit.

Could you elaborate on which field char limit you are taking about and how you've fixed in your mapping?

We're currently working on improved default mappings for logs that are more resilient i.e. not prone to mapping conflicts and field explosions.

@threatangler-jp
Copy link

threatangler-jp commented Jul 25, 2022 via email

@felixbarny
Copy link
Member

Where does the char limitation come from and what's the default value? Do you have a link handy to the Elasticsearch docs?

@threatangler-jp
Copy link

threatangler-jp commented Jul 25, 2022 via email

@felixbarny
Copy link
Member

I still don't understand which char limit you are talking about and what the impact of this is 🙂

Are there any Exceptions on ingest that you can share? Are you referring to the ignore_above option? If fields are longer than that, it shouldn't stop ingestion but the fields aren't indexed. But maybe that's the issue you are facing?

@threatangler-jp
Copy link

Correct ignore_above.

From the documentation - Strings longer than the ignore_above setting will not be indexed or stored

https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-above.html

@dakrone
Copy link
Member

dakrone commented Apr 25, 2023

Closing this in favor of #95534

@dakrone dakrone closed this as completed Apr 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Data streams Data streams and their lifecycles discuss Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

7 participants