Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Discrepancy in result from _validate/query API and actual query validity #2036

Closed
AmiStrn opened this issue Feb 2, 2022 · 7 comments · Fixed by #2416
Closed

[BUG] Discrepancy in result from _validate/query API and actual query validity #2036

AmiStrn opened this issue Feb 2, 2022 · 7 comments · Fixed by #2416
Labels
bug Something isn't working Indexing & Search

Comments

@AmiStrn
Copy link
Contributor

AmiStrn commented Feb 2, 2022

Describe the bug
When validating a query using the _validate, and the query has both: allow_leading_wildcard = false AND a range filter in the boolean query - the result is valid - even if there is a leading wildcard present in the query.

To Reproduce
Steps to reproduce the behavior:

  1. send the following request to a relevant index:
    url: localhost:9200/my-index/_validate/query
    body:
{
    "query": {
        "bool": {
            "must": [
                {
                    "range": {
                        "abc": {
                            "gte": 5,
                            "lte": 8
                        }
                    }
                },
                {
                    "query_string": {
                        "query": "*cash*",
                        "allow_leading_wildcard": false
                    }
                }
            ]
        }
    }
}

response:

{
    "_shards": {
        "total": 1,
        "successful": 1,
        "failed": 0
    },
    "valid": true
}

*** Notice "valid": true while the query has "allow_leading_wildcard": false explicitly
2. now send this query body removing the range filter only:

    "query": {
        "bool": {
            "must": [
                {
                    "query_string": {
                        "query": "*cash*",
                        "allow_leading_wildcard": false
                    }
                }
            ]
        }
    }
}

response:

{
    "_shards": {
        "total": 1,
        "successful": 1,
        "failed": 0
    },
    "valid": false
}

Notice this time the response is as expected "valid": false
3. the original query with the time range filter fails when attempting to send it after the mistake in the validation:

{
    "error": {
        "root_cause": [
            {
                "type": "query_shard_exception",
                "reason": "Failed to parse query [*cash*]",
                "index_uuid": "########",
                "index": "my-index"
            }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
            {
                "shard": 0,
                "index": "my-index",
                "node": "###############",
                "reason": {
                    "type": "query_shard_exception",
                    "reason": "Failed to parse query [*cash*]",
                    "index_uuid": "###############",
                    "index": "my-index",
                    "caused_by": {
                        "type": "parse_exception",
                        "reason": "Cannot parse '*cash*': '*' or '?' not allowed as first character in WildcardQuery",
                        "caused_by": {
                            "type": "parse_exception",
                            "reason": "'*' or '?' not allowed as first character in WildcardQuery"
                        }
                    }
                }
            }
        ]
    },
    "status": 400
}

Expected behavior
Range filter should not affect the validation of the query if we failed on the wildcard! should return "valid": false in both cases.

Host/Environment (please complete the following information):

  • Version 1.2.3

Additional context
Seems to be a bug dating back to Elasticsearch 6.8 at least

@AmiStrn AmiStrn added bug Something isn't working untriaged labels Feb 2, 2022
@AmiStrn
Copy link
Contributor Author

AmiStrn commented Feb 2, 2022

@MarinaRazumovsky let me know if there is anything I should add here to the bug you found

@reta
Copy link
Collaborator

reta commented Feb 16, 2022

@AmiStrn started to look at it, have difficulties in reproducing the issue, could you please share my-index mappings + settings (at least part of it) so I could replicate the exact scenario? thank you!

@AmiStrn
Copy link
Contributor Author

AmiStrn commented Feb 19, 2022

Thanks for looking into this. I will validate if this is only on a specific type of mapping.

@AmiStrn
Copy link
Contributor Author

AmiStrn commented Mar 1, 2022

@reta I tested again and found that this is indeed a bug:
I made a fresh clone from the repo -> ./gradlew run -> added some data to a new index in the cluster and followed the instructions to reproduce above. And got the same bug.
here is the mapping (even though I did nothing but add the index and some simple docs):

{
	"twitter": {
		"aliases": {},
		"mappings": {
			"properties": {
				"metric": {
					"type": "long"
				},
				"name": {
					"type": "text",
					"fields": {
						"keyword": {
							"type": "keyword",
							"ignore_above": 256
						}
					}
				}
			}
		},
		"settings": {
			"index": {
				"creation_date": "1646128753579",
				"number_of_shards": "1",
				"number_of_replicas": "1",
				"uuid": "s9aLnTXTQN-KQMJ0kgHH2A",
				"version": {
					"created": "136217827"
				},
				"provided_name": "twitter"
			}
		}
	}
}

@reta
Copy link
Collaborator

reta commented Mar 7, 2022

Oh @AmiStrn , I now understand the problem clearly: the problem is not range query, the problem is referencing non-existing property in range query, which is abc:

                 "range": {
                        "abc": {
                            "gte": 5,
                            "lte": 8
                        }
                    }

If you replace abc with existing property, fe metric, you will get the desired result:

{
    "query": {
        "bool": {
            "must": [
                {
                    "range": {
                        "metric": {
                            "gte": 5,
                            "lte": 8
                        }
                    }
                },
                {
                    "query_string": {
                        "query": "*cash*",
                        "allow_leading_wildcard": false
                    }
                }
            ]
        }
    }
}
{
    "_shards": {
        "total": 1,
        "successful": 1,
        "failed": 0
    },
    "valid": false
}

Will try to find out what is going on here

@AmiStrn
Copy link
Contributor Author

AmiStrn commented Mar 7, 2022

Maybe it happens also with dates? We discovered this using a valid field (our timestamp field). I will check again using same setup.

@reta
Copy link
Collaborator

reta commented Mar 7, 2022

@AmiStrn I have even more details for you: the validation is subject to what is stored in the index (I didn't know that!), at least for date fields there is an optimization which basically says if the index contains any document with the property value in required range or not. If none - the query is rewritten as MatchNoneQuery and this query is being validated instead of original one (and the validation passes successfully for basically any type of constraints). But if new data gets in, the validation and query may start to fail right away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Indexing & Search
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants