Can't get text on a START OBJECT (please improve error handling for key=>{} case) #41372

Mekk · 2019-04-19T12:00:20Z

What happened

Once more I pushed some logstash changes just to see my logs filled with errors

logstash side:

[2019-04-19T13:02:13,946][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>"664d4df1-1603-487e-89f5-5f5bc81ef959", :_index=>"sfx-0.9.8.1-2019.16", :_type=>"_doc", :routing=>nil}, #<LogStash::Event:0x22254bd0>], :response=>{"index"=>{"_index"=>"sfx-0.9.8.1-2019.16", "_type"=>"_doc", "_id"=>"664d4df1-1603-487e-89f5-5f5bc81ef959", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [call.component] of type [keyword] in document with id '664d4df1-1603-487e-89f5-5f5bc81ef959'", "caused_by"=>{"type"=>"illegal_state_exception", "reason"=>"Can't get text on a START_OBJECT at 1:2204"}}}}}

elastic side:

[2019-04-19T12:42:42,427][DEBUG][o.e.a.b.TransportShardBulkAction] [logstashdb.test] [sfx-0.9.8-2019.16][1] failed to execute bulk item (index) index {[sfx-0.9.8-2019.16][_doc][4188698c-2348-4c45-b364-0cd512f8126f], source[n/a, actual length: [2.2kb], max length: 2kb]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse field [call.component] of type [keyword] in document with id '4188698c-2348-4c45-b364-0cd512f8126f'
        at org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:280) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:468) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:485) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.index.mapper.DocumentParser.innerParseObject(DocumentParser.java:392) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:381) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:465) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:485) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.index.mapper.DocumentParser.innerParseObject(DocumentParser.java:392) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:381) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.index.mapper.DocumentParser.internalParseDocument(DocumentParser.java:98) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:71) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:267) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:770) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:747) ~[elasticsearch-7.0.0.jar:7.0.0]

It took me an hour (to setup dead-letter-queue and write logstash config to prettyprint it's contents) to find out what happened. And it happened that I accidentally constructed

    …
   "call":   { "component":  {} }
}

(where component is usually short text and is mapped as keyword).
I face this third or fourth time, nevertheless every time I am deeply confused.

What I would be grateful for

There are a few things which seem easy, but would improve the situation very noticeably:

Please log the problematic value.

In case field can't be parsed, Elastic error message really could quote the problematic value. This would be a giant improvement (not just here but also in all those bad number/bad date/… cases), especially in ELK setups (where there is no easy way to find the event source).

Handle ignore_malformed for keywords

As keyword parsing can fail, it should be possible to write in mapping
"component": { "type": "keyword", "ignore_malformed": true },
(I tried, my mapping was rejected with syntax error, looks like for keywords this
setting is illegal just now)

Consider handling this dictionary case in some non-failing way

Both ignoring it, and stringifying it in any way, would be welcome.

Realistic use case (for curious)

Sending empty dictionary instead of text seems strange and unlikely, but rather simple logstash code happens to do. I use

xml {
    …
   force_array => false
   suppress_empty => false
   target => "[@metadata][xml]"
}

(I need disabled suppress_empty as I have some meaningful <flag-like-tags/> to probe for in rules used later). Then I copy some values, in particular this very case is related to

mutate {
    copy => { "[@metadata][xml][component]" => "[call][component]" }
}

It works fine for messages where <component> is missing (no such field in parsed xml, so nothing to copy), it works fine for messages like

<some-message>
    <component>something</component>
    …

(proper copy of something) but from time to time something logs

<some-message>
    <component></component>
    …

and buuuum, I get empty dictionary.

The text was updated successfully, but these errors were encountered:

martijnvg · 2019-04-19T14:25:08Z

@Mekk Thanks for reporting this issue. I agree that the error message should be improved, along the line of field [call.component] doesn't except an empty object.

On top of that like you describe a ignore_malformed would be help in this case too. Currently only number based fields support this, but looks like the support for this should be extended to keyword and text field types too. In the case that the field mapping doesn't allow malformed, then at least an improved error message should be returned.

elasticmachine · 2019-04-19T14:25:22Z

Pinging @elastic/es-search

jtibshirani · 2019-04-19T21:31:01Z

As a note, this issue is related to #12366, where we decided to add support for ignore_malformed to other field types, and also to make sure ignore_malformed handles object data. It would be good to use the other issue to track these enhancements.

colings86 · 2019-04-30T13:20:15Z

This issue would be fixed by #12366 but in the mean time it would be good to improve the error message here especially as the error message would still be relevant when ignore_malformed = false. We will leave this open to fix the error message part of this

Mekk · 2019-04-30T21:56:22Z

Regarding the fix of error message: PLEASE QUOTE THE PROBLEMATIC VALUE. No matter what. Quote it. Please. Quote it. Escape if necessary, truncate if necessary, but quote it. Please.

Mekk · 2019-04-30T22:06:06Z

Also it would not hurt to name the index, and mayhaps the mapping.

Some suggestion for the error message:

Document rejected due to invalid content: field [call.component] is expected to be of type [keyword] (according to the current mapping of index cdx-2019.13) but can not be converted to this type. Problematic value: '{}' (type: dict). Document id: '4188698c-2348-4c45-b364-0cd512f8126f'. See http://some.where

(and java stacktrace doesn't seem to be necessary, and linked help page tells about ignore_malformed, and about dead letter queue, and about typical incompatibilities…)

In case of ignore malformed, similar note would also be nice (Malformed field stripped…).

sandmannn · 2019-05-05T21:11:05Z

Hi, I looked a bit on what has to be adjusted for this change. Since every implementation of FieldMapper has its own realization of parseCreateField() where this issue can occur, looks like every such implementation should be extended to provide information about partially read value. For the cases where we can provide information about read value, we can throw a more specific exceptioin with this value which will be used to construct in generic FieldMapper handler.

Just for the purposes of communication I pushed the dirty draft of this approach in my fork https://github.com/Sandmannn/elasticsearch/compare/master...Sandmannn:extend-logging-misread?expand=1 It may make sense to align on approach before starting the refactoring in all the inheritors of FieldMapper . In case we validate this approach, I can clean up the draft and create a first PR of this refactoring.

mayya-sharipova · 2019-05-06T21:49:05Z

@sandmannn Thanks for taking a look at this. For starting, I would suggest a little simpler approach with code changes only in FieldMapper:parse function.

In the catch exception section of this function, may be it would be just enough to look at context.parser.currentToken(), and read the value from the parser accordingly. For example, if the token is VALUE_STRING, we can do parser.text() to get the final value to display to a user. Another example, if the token is START_OBJECT, we can continue to read until the END_OBJECT to get the final value for display.

sandmannn · 2019-05-07T17:25:16Z

@mayya-sharipova that approach is indeed more simple and elegant. We should follow it if possible.

One small question though: It implicitly assumes that the implementations of FieldMapper:parseCreateField are not calling parser:nextToken() (then the current token of parser at the moment of handling the exception can differ from the first token of JSON representation of the value), but I did not find any such activity happening in current implementors. Any mappers that could potentially consist of several JSON tokens seem to reimplement the whole FieldMapper:parse method instead, e.g. GeoPointFieldMapper. We can assume any future implementors will also behave in similar way, or?

mayya-sharipova · 2019-05-07T18:56:07Z

@sandmannn Right, I think we can assume that future implementations of FieldMapper:parseCreateField will also have the same behaviour. We can put a small comment in the
FieldMapper:parseCreateField to say that implementations of this method should ensure that on failing to parse parser.currentToken() must be the current failing token.

sandmannn · 2019-05-09T10:56:29Z

So I guess I would go on creating a PR according to the current plan, or?

mayya-sharipova · 2019-05-21T12:08:13Z

@sandmannn Thanks, please go ahead creating a PR according to the current plan.

jtibshirani · 2019-07-10T21:35:01Z

Closing this out now that #42321 has merged. The PR adds a preview of the field value to the error message. This should help to clarify the confusing error message mentioned in the issue description ("Can't get text on a START OBJECT").

This issue also suggested extending ignore_malformed to more field types, and to apply it to JSON input as well. The discussion + implementation of this work is tracked in #12366.

martijnvg added :Search Foundations/Mapping Index mappings, including merging and defining field types team-discuss labels Apr 19, 2019

Mekk mentioned this issue Apr 20, 2019

ignore_malformed to support ignoring JSON objects ingested into fields of the wrong type #12366

Open

colings86 added >enhancement help wanted adoptme and removed team-discuss labels Apr 30, 2019

jtibshirani mentioned this issue May 8, 2019

Improve mapper parsing exception when trying to index an object into a concrete field #34869

Closed

jtibshirani added >docs General docs changes and removed >enhancement labels May 20, 2019

sandmannn mentioned this issue May 21, 2019

Added parsing of erroneous field value #42321

Merged

jtibshirani closed this as completed Jul 10, 2019

javanna added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't get text on a START OBJECT (please improve error handling for key=>{} case) #41372

Can't get text on a START OBJECT (please improve error handling for key=>{} case) #41372

Mekk commented Apr 19, 2019 •

edited

Loading

martijnvg commented Apr 19, 2019

elasticmachine commented Apr 19, 2019

jtibshirani commented Apr 19, 2019 •

edited

Loading

colings86 commented Apr 30, 2019 •

edited

Loading

Mekk commented Apr 30, 2019

Mekk commented Apr 30, 2019

sandmannn commented May 5, 2019

mayya-sharipova commented May 6, 2019

sandmannn commented May 7, 2019

mayya-sharipova commented May 7, 2019

sandmannn commented May 9, 2019

mayya-sharipova commented May 21, 2019

jtibshirani commented Jul 10, 2019

Can't get text on a START OBJECT (please improve error handling for key=>{} case) #41372

Can't get text on a START OBJECT (please improve error handling for key=>{} case) #41372

Comments

Mekk commented Apr 19, 2019 • edited Loading

martijnvg commented Apr 19, 2019

elasticmachine commented Apr 19, 2019

jtibshirani commented Apr 19, 2019 • edited Loading

colings86 commented Apr 30, 2019 • edited Loading

Mekk commented Apr 30, 2019

Mekk commented Apr 30, 2019

sandmannn commented May 5, 2019

mayya-sharipova commented May 6, 2019

sandmannn commented May 7, 2019

mayya-sharipova commented May 7, 2019

sandmannn commented May 9, 2019

mayya-sharipova commented May 21, 2019

jtibshirani commented Jul 10, 2019

Mekk commented Apr 19, 2019 •

edited

Loading

jtibshirani commented Apr 19, 2019 •

edited

Loading

colings86 commented Apr 30, 2019 •

edited

Loading