-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ignore_malformed to support ignoring JSON objects ingested into fields of the wrong type #12366
Comments
+1 |
While working on this issue, I found out that it fails on other types too, but for another reason: For example, for integer:
Thats happening because, unlike in the So, I think two things must be done: Does this make sense, @clintongormley? I'll happily send a PR for this. |
Sorry for the delayed response, I lost this one in email. @clintongormley I think it is probably worth making the behavior consistent, and it does seem to me finding an object where a specific piece of data is expected constitutes "malformed" data. @andrestc A PR would be great. |
I want to upvote this issue!
This is not at all what I would expect from the documentation https://www.elastic.co/guide/en/elasticsearch/reference/2.0/ignore-malformed.html; please improve the documentation or fix the behavior (preferred!). @clintongormley "i just realised that the original post refers to a string field, which doesn't support ignore_malformed..." Why should string fields not support ignore_malformed? |
+1 I think there could be done much more e.g. set the field to a default value and add an annotation to the document - so users can see what went wrong. In my case all documents from Apache Logs having "-" in the size field (Integer) got ignored. I could tell you 100 stories, why Elasticsearch don't take documents from real data sources ... (just to mention one more #3714) I think this problem could be handled much better:
A good example is Logsene, it adds Error-Annotations to failed documents together with the String version of the original source document (@sematext can catch Elasticsearch errors during the indexing process). So at least Logsene users can see failed index operations and orginal document in their UI or in Kibana. Thanks to this feature I'm able to report this issue to you. It would be nice when such improvements would be available out of box for all Elasticsearch users. |
any news here? |
I wish to upvote the issue too. |
Same problem with dates. When adding an object with a field of type "date", in my DB whenever it is empty it's represented as "" (empty string) causing this error:
|
Same problem with me. I'm using the ELK stack in which people may use the same properties but with different types. I don't want those properties to be searchable but I don't want to loose the entity event neither. I though |
We are having issues with this same feature. We have documents that sometimes decide to have objects inside something that was intedended to have strings. We would like to not lose the whole document just because one of the nodes of data are malformed. This is the behaviour I expected to get from setting ignore_malformed on the properties, and I would applaude such a feature. |
Hay, I have the same problem. Is there any solution (even if it is a bit hacky) out there? |
Facing this in elasticsearch 2.3.1 . Before this bug is fixed we should atleast have a list of bad fields inside mapper_parsing_exception error so that the app can choose to remove them . Currently there is no standard field in the error through which these keys can be retrieved - "error":{"type":"mapper_parsing_exception","reason":"object mapping for [A.B.C.D] tried to parse field [D] as object, but found a concrete value"}} The app would have to parse the reason string and extract A.B.C.D which will fail if the error doc format changes . Additionally mapper_parsing_exception error itself must be using different formats for different parsing error scenarios all of which need to be handled by the app |
I used a workaround for this matter following the recommendations from Elasticsearch forums and official documentation. Declaring the mapping of the objects you want to index (if you know it), choosing |
for usage as a real log stash I would say something like #12366 (comment) |
Bumping, this issue is preventing a number of my messages to successfully be processed as a field object is returned as an empty string on rare cases. |
Bump, this is proving to be an extremely tedious (non) feature to work around. |
I've found a way around this but it comes at a cost. It could be worth it for those like me who are in a situation where intervening directly on your data flow (like checking and fixing the log line yourself before sending it to ES) is something you'd like to avoid in the short term. Set the Hope this helps. It certainly saved me a lot of trouble... |
Thats a good trick. Ill try that out.
…On 19 Jan 2017 16:01, "patrick-oyst" ***@***.***> wrote:
I've found a way around this but it comes at a cost. It could be worth it
for those like me who are in a situation where intervening directly on your
data flow (like checking and fixing the log line yourself before sending it
to ES) is something you'd like to avoid in the short term. Set your
object's enabled setting to false. This will make the fields non
searchable though. This isn't too big of an issue in my context because the
reason this field is so unpredictable is the reason I need
ignore_malformed to begin with, so it's not a particularly useful field
to search on anyways, though you still have access to the data when you
search for that document using another field.
Hope this helps. It certainly saved *me* a lot of trouble...
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#12366 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGC4v4w0ZIXlOGN40nAgl_8fpy0dj2CUks5rT3rhgaJpZM4Fcpph>
.
|
+1 |
+1 |
1 similar comment
+1 |
I was referred here after raising #41372 Please, please, consider which options actual user have. If „dirty” data is allowed to enter ES (and preferably flagged somehow) I can inspect it, I can analyze it, I can find it to test with it, I can count it. And I can see that it exists. Full Kibana to my power in particular. If „dirty” data is rejected, I must visit ES logs with those horrible java stacktraces, to find cryptic error message about bulk post rejects. In most cases I don't even have a clue which data caused the problem or what the problem really is (see my #41372 for example error, good luck guessing why it happened). Regarding data loss: you fear business decisions made on data with field missed? I can make those business decisions based on the database which doesn't have 20% of records at all because they were rejected (mayhaps due to minor field irrelevant in most cases). And unless I am ES sysadmin, I won't even know (with dirty data I have good chance to notice problematic records while exploring, and I can even have sanity queries). From ELK own field: Logstash does very good thing with _grok_parse_failure tags (which can be further improved to differentiate between rules with custom tags). Sth is wrong? I see records with those tags, can inspect them, count them, and analyze the situation. |
One issue to consider during implementation if this does get addressed, dynamic templates currently allow this setting even though it is rejected when directly mapping a field.
ok:
The resulting fields are created without issue also. |
我使用 springcloud aliaba 集成 es报以下错误
es版本 7.6.2
|
@jpountz Are you the decision maker on this? You've argued against it repeatedly, which I disagree with, but if it is what it is then maybe this issue should be closed as a won't-fix? Or am I wrong and it's really still under consideration? |
Pinging @elastic/es-search (Team:Search) |
In Elastic Observability, we're working on making log ingestion more resilient. In that context, we've discussed how to deal with object/scalar conflicts more gracefully and whether it makes sense to prioritize this issue or whether there are other alternatives. A little while ago, Elasticsearch has introduced the Therefore, we're considering to make With that in mind, are there still use cases for supporting |
@felixbarny - what's the implication there for any existing painless scripts etc which currently rely on iteration over sub-objects? I've not noticed the new setting before so just taking a cursory glance, but are we now expecting all source documents to be flattened everywhere? That sounds like a massive efficiency hit when you need to do selective source document filtering, not to mention a fair amount of data bloat on the wire with deeply nested prefixes being repeated? |
Hey Eric, After #97972 has been implemented, the _source does not need to change. It's just about how the documents are mapped internally. However, as explained on the docs for the If you're unsure if your source documents contain nested or flattened fields, you can use the field API in painless scripts which is able to access fields in either notation. We're also working on adding support for accessing dotted fields in ingest processors: #96648. But again, you don't need to change structure of your documents when sending them to Elasticsearch. The idea is that dotted and nested fields are treated equally in all places. Having said that, in OpenTelemetry, all attributes are by definition a flat key/value pair. As we're continuing to improve the support for OpenTelemetry, we may map OTel data with flattened keys.
I'd assume that source filtering using wildcards would still work as expected.
That's fair. But I'd expect compression to mostly take care of that anyway? |
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
这是来自QQ邮箱的假期自动回复邮件。你好,我最近正在休假中,无法亲自回复你的邮件。我将在假期结束后,尽快给你回复。
|
Indexing a document with an object type on a field that has already been mapped as a string type causes
MapperParsingException
, even ifindex.mapping.ignore_malformed
has been enabled.Reproducible test case
On Elasticsearch 1.6.0:
Expected behaviour
Indexing a document with an object field where Elasticsearch expected a string field to be will not fail the whole document when
index.mapping.ignore_malformed
is enabled. Instead, it will ignore the invalid object field.The text was updated successfully, but these errors were encountered: