[BUG] Discrepancy in result from _validate/query API and actual query validity #2036

AmiStrn · 2022-02-02T12:18:33Z

Describe the bug
When validating a query using the _validate, and the query has both: allow_leading_wildcard = false AND a range filter in the boolean query - the result is valid - even if there is a leading wildcard present in the query.

To Reproduce
Steps to reproduce the behavior:

send the following request to a relevant index:
url: localhost:9200/my-index/_validate/query
body:

{
    "query": {
        "bool": {
            "must": [
                {
                    "range": {
                        "abc": {
                            "gte": 5,
                            "lte": 8
                        }
                    }
                },
                {
                    "query_string": {
                        "query": "*cash*",
                        "allow_leading_wildcard": false
                    }
                }
            ]
        }
    }
}

response:

{
    "_shards": {
        "total": 1,
        "successful": 1,
        "failed": 0
    },
    "valid": true
}

*** Notice "valid": true while the query has "allow_leading_wildcard": false explicitly
2. now send this query body removing the range filter only:

    "query": {
        "bool": {
            "must": [
                {
                    "query_string": {
                        "query": "*cash*",
                        "allow_leading_wildcard": false
                    }
                }
            ]
        }
    }
}

response:

{
    "_shards": {
        "total": 1,
        "successful": 1,
        "failed": 0
    },
    "valid": false
}

Notice this time the response is as expected "valid": false
3. the original query with the time range filter fails when attempting to send it after the mistake in the validation:

{
    "error": {
        "root_cause": [
            {
                "type": "query_shard_exception",
                "reason": "Failed to parse query [*cash*]",
                "index_uuid": "########",
                "index": "my-index"
            }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
            {
                "shard": 0,
                "index": "my-index",
                "node": "###############",
                "reason": {
                    "type": "query_shard_exception",
                    "reason": "Failed to parse query [*cash*]",
                    "index_uuid": "###############",
                    "index": "my-index",
                    "caused_by": {
                        "type": "parse_exception",
                        "reason": "Cannot parse '*cash*': '*' or '?' not allowed as first character in WildcardQuery",
                        "caused_by": {
                            "type": "parse_exception",
                            "reason": "'*' or '?' not allowed as first character in WildcardQuery"
                        }
                    }
                }
            }
        ]
    },
    "status": 400
}

Expected behavior
Range filter should not affect the validation of the query if we failed on the wildcard! should return "valid": false in both cases.

Host/Environment (please complete the following information):

Version 1.2.3

Additional context
Seems to be a bug dating back to Elasticsearch 6.8 at least

The text was updated successfully, but these errors were encountered:

AmiStrn · 2022-02-02T12:19:58Z

@MarinaRazumovsky let me know if there is anything I should add here to the bug you found

reta · 2022-02-16T14:59:40Z

@AmiStrn started to look at it, have difficulties in reproducing the issue, could you please share my-index mappings + settings (at least part of it) so I could replicate the exact scenario? thank you!

AmiStrn · 2022-02-19T21:45:56Z

Thanks for looking into this. I will validate if this is only on a specific type of mapping.

AmiStrn · 2022-03-01T10:26:11Z

@reta I tested again and found that this is indeed a bug:
I made a fresh clone from the repo -> ./gradlew run -> added some data to a new index in the cluster and followed the instructions to reproduce above. And got the same bug.
here is the mapping (even though I did nothing but add the index and some simple docs):

{
	"twitter": {
		"aliases": {},
		"mappings": {
			"properties": {
				"metric": {
					"type": "long"
				},
				"name": {
					"type": "text",
					"fields": {
						"keyword": {
							"type": "keyword",
							"ignore_above": 256
						}
					}
				}
			}
		},
		"settings": {
			"index": {
				"creation_date": "1646128753579",
				"number_of_shards": "1",
				"number_of_replicas": "1",
				"uuid": "s9aLnTXTQN-KQMJ0kgHH2A",
				"version": {
					"created": "136217827"
				},
				"provided_name": "twitter"
			}
		}
	}
}

reta · 2022-03-07T17:07:52Z

Oh @AmiStrn , I now understand the problem clearly: the problem is not range query, the problem is referencing non-existing property in range query, which is abc:

If you replace abc with existing property, fe metric, you will get the desired result:

{
    "query": {
        "bool": {
            "must": [
                {
                    "range": {
                        "metric": {
                            "gte": 5,
                            "lte": 8
                        }
                    }
                },
                {
                    "query_string": {
                        "query": "*cash*",
                        "allow_leading_wildcard": false
                    }
                }
            ]
        }
    }
}

{
    "_shards": {
        "total": 1,
        "successful": 1,
        "failed": 0
    },
    "valid": false
}

Will try to find out what is going on here

AmiStrn · 2022-03-07T17:32:44Z

Maybe it happens also with dates? We discovered this using a valid field (our timestamp field). I will check again using same setup.

reta · 2022-03-07T20:34:06Z

@AmiStrn I have even more details for you: the validation is subject to what is stored in the index (I didn't know that!), at least for date fields there is an optimization which basically says if the index contains any document with the property value in required range or not. If none - the query is rewritten as MatchNoneQuery and this query is being validated instead of original one (and the validation passes successfully for basically any type of constraints). But if new data gets in, the validation and query may start to fail right away.

AmiStrn added bug Something isn't working untriaged labels Feb 2, 2022

anasalkouz added Indexing & Search and removed untriaged labels Feb 8, 2022

reta mentioned this issue Mar 9, 2022

Discrepancy in result from _validate/query API and actual query validity #2416

Merged

5 tasks

setiah closed this as completed in #2416 Mar 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Discrepancy in result from _validate/query API and actual query validity #2036

[BUG] Discrepancy in result from _validate/query API and actual query validity #2036

AmiStrn commented Feb 2, 2022 •

edited

Loading

AmiStrn commented Feb 2, 2022

reta commented Feb 16, 2022 •

edited

Loading

AmiStrn commented Feb 19, 2022

AmiStrn commented Mar 1, 2022

reta commented Mar 7, 2022

AmiStrn commented Mar 7, 2022 •

edited

Loading

reta commented Mar 7, 2022 •

edited

Loading

[BUG] Discrepancy in result from _validate/query API and actual query validity #2036

[BUG] Discrepancy in result from _validate/query API and actual query validity #2036

Comments

AmiStrn commented Feb 2, 2022 • edited Loading

AmiStrn commented Feb 2, 2022

reta commented Feb 16, 2022 • edited Loading

AmiStrn commented Feb 19, 2022

AmiStrn commented Mar 1, 2022

reta commented Mar 7, 2022

AmiStrn commented Mar 7, 2022 • edited Loading

reta commented Mar 7, 2022 • edited Loading

AmiStrn commented Feb 2, 2022 •

edited

Loading

reta commented Feb 16, 2022 •

edited

Loading

AmiStrn commented Mar 7, 2022 •

edited

Loading

reta commented Mar 7, 2022 •

edited

Loading