-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Dead letter queue #86170
Comments
Pinging @elastic/es-distributed (Team:Distributed) |
Pinging @elastic/es-data-management (Team:Data Management) |
I'm moving this over to the data management team for now. It's definitely on the border between the data-management and distrib areas but I think the data management folks are a better choice to think about this idea first. |
Thanks @DaveCTurner for the help in triaging this. Not sure if that has an impact on responsibilities within the ES team but it's currently an open question whether DLQs should be exclusive to data streams or if they should also apply to regular indices. @jsvd brought up some good arguments in favor of also adding DLQs to regular indices to facilitate non-time series use cases in Logstash and Enterprise search that would benefit from a DLQ. |
DLQs could also help with a lot of the use cases that are mentioned in |
This is great. We have achieved the same thing using a different method. We set ignore_malformed to true on filebeat and elastic agent integration index templates. Some native index templates will not accept the ignore_malformed true setting though and so that is a blind spot. We also populate error.message anytime a pipeline processor fails. We set ignore failure to enabled on all processors. And we occasionally enable logging on our agents (this is too costly to do all the time). We then have a quality assurance process that searches a * index pattern for the below string: (message : mapper_parsing_exception) OR (error.message : *) OR (_ignored : *) OR (message : dropping) We are reactively catching and then able to fix these issues (those we are not blind to at least). We have been shocked though to see the volume of these issues coming from native index template settings in new filebeat module and agent integrations. So, we are asking the question - is the root issue a lack of discipline in alignment with ECS when Elastic is building new modules and integrations? A minority of the issues are not ECS alignment related but are field char limitation related but the fields this is happening to are easily identifiable as a field that would need a larger char limit. |
Could you elaborate on which field char limit you are taking about and how you've fixed in your mapping? We're currently working on improved default mappings for logs that are more resilient i.e. not prone to mapping conflicts and field explosions. |
Here are some examples of fields that would cause data loss without ignore
malformed set to true and with using native index templates. These are due
to char limitation issues.
process.command_line.caseless
process.command_line
process.args
winlog.event_data.ObjectProperties
winlog.event_data.AttributeValue
winlog.event_data.TaskContentNew
|
Where does the char limitation come from and what's the default value? Do you have a link handy to the Elasticsearch docs? |
We are finding fields set to a 1024 char limit but that
are commonly populated with a much higher number of characters. We are
customizing this configuration to 8191 which is the max recommended in the
kibana interface and so far that has been sufficient.
I had originally posted that the char limitation issue is not due to ECS
misalignment but we are now thinking it is....at least in some instances.
If an ECS field is not mapped natively then it is up to dynamic mapping and
dynamic mapping can create a field mapping with a char limit too low for
the field. So there seems to be an indirect relationship between the char
limit issue and native index templates not aligning to ECS.
|
I still don't understand which char limit you are talking about and what the impact of this is 🙂 Are there any Exceptions on ingest that you can share? Are you referring to the |
Correct ignore_above. From the documentation - Strings longer than the ignore_above setting will not be indexed or stored https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-above.html |
Closing this in favor of #95534 |
With Elastic Agent we are fully embracing data streams and the data stream naming scheme. In many scenarios, we control the ingestion data structure and the mappings put in place. But as we encourage everyone to use the data stream naming scheme and for example for
logs-*-*
we put a basic ECS template in place, it is possible that on ingest time it can come to conflict. Reasons might be because the fieldfoo
is anobject
but someone trying to ingest data sendsfoo
as akeyword
.Currently, Elasticsearch just rejects the data with an error. Instead it would be nice to be able to configure a dead letter queue where these events end up in. This ensures not the client has to deal with mapping conflicts and ensures all data is ingested.
This dead letter queue could be generic or per data stream (up for discussion). An assumption I make is that this dead letter queue by default would not have any mappings specified and queries have to be run with runtime fields.
Users could look at the dead letter queue and use it to debug their ingest pipelines / mappings to the "reindex" part of the events in the dead letter queue.
The text was updated successfully, but these errors were encountered: