Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't get text on a START OBJECT (please improve error handling for key=>{} case) #41372

Closed
Mekk opened this issue Apr 19, 2019 · 13 comments
Closed
Labels
>docs General docs changes help wanted adoptme :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@Mekk
Copy link

Mekk commented Apr 19, 2019

What happened

Once more I pushed some logstash changes just to see my logs filled with errors

  • logstash side:
[2019-04-19T13:02:13,946][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>"664d4df1-1603-487e-89f5-5f5bc81ef959", :_index=>"sfx-0.9.8.1-2019.16", :_type=>"_doc", :routing=>nil}, #<LogStash::Event:0x22254bd0>], :response=>{"index"=>{"_index"=>"sfx-0.9.8.1-2019.16", "_type"=>"_doc", "_id"=>"664d4df1-1603-487e-89f5-5f5bc81ef959", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [call.component] of type [keyword] in document with id '664d4df1-1603-487e-89f5-5f5bc81ef959'", "caused_by"=>{"type"=>"illegal_state_exception", "reason"=>"Can't get text on a START_OBJECT at 1:2204"}}}}}
  • elastic side:
[2019-04-19T12:42:42,427][DEBUG][o.e.a.b.TransportShardBulkAction] [logstashdb.test] [sfx-0.9.8-2019.16][1] failed to execute bulk item (index) index {[sfx-0.9.8-2019.16][_doc][4188698c-2348-4c45-b364-0cd512f8126f], source[n/a, actual length: [2.2kb], max length: 2kb]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse field [call.component] of type [keyword] in document with id '4188698c-2348-4c45-b364-0cd512f8126f'
        at org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:280) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:468) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:485) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.index.mapper.DocumentParser.innerParseObject(DocumentParser.java:392) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:381) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:465) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:485) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.index.mapper.DocumentParser.innerParseObject(DocumentParser.java:392) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:381) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.index.mapper.DocumentParser.internalParseDocument(DocumentParser.java:98) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:71) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:267) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:770) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:747) ~[elasticsearch-7.0.0.jar:7.0.0]

It took me an hour (to setup dead-letter-queue and write logstash config to prettyprint it's contents) to find out what happened. And it happened that I accidentally constructed

    …
   "call":   { "component":  {} }
}

(where component is usually short text and is mapped as keyword).
I face this third or fourth time, nevertheless every time I am deeply confused.

What I would be grateful for

There are a few things which seem easy, but would improve the situation very noticeably:

  1. Please log the problematic value.

In case field can't be parsed, Elastic error message really could quote the problematic value. This would be a giant improvement (not just here but also in all those bad number/bad date/… cases), especially in ELK setups (where there is no easy way to find the event source).

  1. Handle ignore_malformed for keywords

As keyword parsing can fail, it should be possible to write in mapping
"component": { "type": "keyword", "ignore_malformed": true },
(I tried, my mapping was rejected with syntax error, looks like for keywords this
setting is illegal just now)

  1. Consider handling this dictionary case in some non-failing way

Both ignoring it, and stringifying it in any way, would be welcome.

Realistic use case (for curious)

Sending empty dictionary instead of text seems strange and unlikely, but rather simple logstash code happens to do. I use

xml {
    …
   force_array => false
   suppress_empty => false
   target => "[@metadata][xml]"
}

(I need disabled suppress_empty as I have some meaningful <flag-like-tags/> to probe for in rules used later). Then I copy some values, in particular this very case is related to

mutate {
    copy => { "[@metadata][xml][component]" => "[call][component]" }
}

It works fine for messages where <component> is missing (no such field in parsed xml, so nothing to copy), it works fine for messages like

<some-message>
    <component>something</component>
    …

(proper copy of something) but from time to time something logs

<some-message>
    <component></component>
    …

and buuuum, I get empty dictionary.

@martijnvg
Copy link
Member

@Mekk Thanks for reporting this issue. I agree that the error message should be improved, along the line of field [call.component] doesn't except an empty object.

On top of that like you describe a ignore_malformed would be help in this case too. Currently only number based fields support this, but looks like the support for this should be extended to keyword and text field types too. In the case that the field mapping doesn't allow malformed, then at least an improved error message should be returned.

@martijnvg martijnvg added :Search Foundations/Mapping Index mappings, including merging and defining field types team-discuss labels Apr 19, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@jtibshirani
Copy link
Contributor

jtibshirani commented Apr 19, 2019

As a note, this issue is related to #12366, where we decided to add support for ignore_malformed to other field types, and also to make sure ignore_malformed handles object data. It would be good to use the other issue to track these enhancements.

@colings86
Copy link
Contributor

colings86 commented Apr 30, 2019

This issue would be fixed by #12366 but in the mean time it would be good to improve the error message here especially as the error message would still be relevant when ignore_malformed = false. We will leave this open to fix the error message part of this

@Mekk
Copy link
Author

Mekk commented Apr 30, 2019

Regarding the fix of error message: PLEASE QUOTE THE PROBLEMATIC VALUE. No matter what. Quote it. Please. Quote it. Escape if necessary, truncate if necessary, but quote it. Please.

@Mekk
Copy link
Author

Mekk commented Apr 30, 2019

Also it would not hurt to name the index, and mayhaps the mapping.

Some suggestion for the error message:

Document rejected due to invalid content: field [call.component] is expected to be of type [keyword] (according to the current mapping of index cdx-2019.13) but can not be converted to this type. Problematic value: '{}' (type: dict). Document id: '4188698c-2348-4c45-b364-0cd512f8126f'. See http://some.where

(and java stacktrace doesn't seem to be necessary, and linked help page tells about ignore_malformed, and about dead letter queue, and about typical incompatibilities…)

In case of ignore malformed, similar note would also be nice (Malformed field stripped…).

@sandmannn
Copy link
Contributor

Hi, I looked a bit on what has to be adjusted for this change. Since every implementation of FieldMapper has its own realization of parseCreateField() where this issue can occur, looks like every such implementation should be extended to provide information about partially read value. For the cases where we can provide information about read value, we can throw a more specific exceptioin with this value which will be used to construct in generic FieldMapper handler.

Just for the purposes of communication I pushed the dirty draft of this approach in my fork https://github.com/Sandmannn/elasticsearch/compare/master...Sandmannn:extend-logging-misread?expand=1 It may make sense to align on approach before starting the refactoring in all the inheritors of FieldMapper . In case we validate this approach, I can clean up the draft and create a first PR of this refactoring.

@mayya-sharipova
Copy link
Contributor

@sandmannn Thanks for taking a look at this. For starting, I would suggest a little simpler approach with code changes only in FieldMapper:parse function.

In the catch exception section of this function, may be it would be just enough to look at context.parser.currentToken(), and read the value from the parser accordingly. For example, if the token is VALUE_STRING, we can do parser.text() to get the final value to display to a user. Another example, if the token is START_OBJECT, we can continue to read until the END_OBJECT to get the final value for display.

@sandmannn
Copy link
Contributor

@mayya-sharipova that approach is indeed more simple and elegant. We should follow it if possible.

One small question though: It implicitly assumes that the implementations of FieldMapper:parseCreateField are not calling parser:nextToken() (then the current token of parser at the moment of handling the exception can differ from the first token of JSON representation of the value), but I did not find any such activity happening in current implementors. Any mappers that could potentially consist of several JSON tokens seem to reimplement the whole FieldMapper:parse method instead, e.g. GeoPointFieldMapper. We can assume any future implementors will also behave in similar way, or?

@mayya-sharipova
Copy link
Contributor

@sandmannn Right, I think we can assume that future implementations of FieldMapper:parseCreateField will also have the same behaviour. We can put a small comment in the
FieldMapper:parseCreateField to say that implementations of this method should ensure that on failing to parse parser.currentToken() must be the current failing token.

@sandmannn
Copy link
Contributor

So I guess I would go on creating a PR according to the current plan, or?

@jtibshirani jtibshirani added >docs General docs changes and removed >enhancement labels May 20, 2019
@mayya-sharipova
Copy link
Contributor

@sandmannn Thanks, please go ahead creating a PR according to the current plan.

@jtibshirani
Copy link
Contributor

Closing this out now that #42321 has merged. The PR adds a preview of the field value to the error message. This should help to clarify the confusing error message mentioned in the issue description ("Can't get text on a START OBJECT").

This issue also suggested extending ignore_malformed to more field types, and to apply it to JSON input as well. The discussion + implementation of this work is tracked in #12366.

@javanna javanna added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>docs General docs changes help wanted adoptme :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

8 participants