Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancy in result from _validate/query API and actual query validity #2416

Merged
merged 2 commits into from
Mar 14, 2022

Conversation

reta
Copy link
Collaborator

@reta reta commented Mar 9, 2022

Signed-off-by: Andriy Redko [email protected]

Description

It turned out, the validation is subject to what is stored in the index. At least for date properties there is an optimization which basically says if the index contains any document with the property value in required range or not. If none - the query is rewritten as MatchNoneQuery and that query is being validated instead of original one (in this case, the validation passes successfully for basically any type of queries or constraints). However, as the new data gets in, the validation and query may start to fail at any moment.

The suggested solution for validation flow is to not apply the optimization for date properties but always assume there is a match. As such, the query search/shard contexts have been enriched with new property validate(): it is set to true only when query validation request is served.

Note: Another variation of this issue is when query contains properties which do not exist in the index mapping at all. In this case, the validate/search behaviour is consistent: the query is always valid and always returns no results.

Issues Resolved

Fixes #2036

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@reta
Copy link
Collaborator Author

reta commented Mar 9, 2022

@AmiStrn have a solution for you :-)

@opensearch-ci-bot
Copy link
Collaborator

Can one of the admins verify this patch?

@AmiStrn
Copy link
Contributor

AmiStrn commented Mar 9, 2022

@AmiStrn have a solution for you :-)

Wow, this is awesome! Nice catch. Thanks @reta , i will run this branch to validate that it fixed the issue on our end as well.

@reta reta marked this pull request as ready for review March 9, 2022 14:44
@reta reta requested a review from a team as a code owner March 9, 2022 14:44
@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success b949bb5
Log 3199

Reports 3199

@dblock dblock requested a review from andrross March 9, 2022 15:22
@@ -451,6 +451,11 @@ public String getWriteableName() {
return MappedFieldType.Relation.INTERSECTS;
}

// For validation, always assume that there is an intersection
if (shardContext.validate()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we validating the range is in valid DateTimeFormat (can be parsed as such) before bypassing the validation to the other part of the query with INTERSECT? I could be wrong, but seems like the dateMathParser below does some validation for the date format here. In case of invalid range, the query should be marked invalid. Couldn't find a test case for this scenario or maybe I missed? Just want to make sure we are not ignoring the case where timestamp is invalid.

@@ -451,6 +451,11 @@ public String getWriteableName() {
return MappedFieldType.Relation.INTERSECTS;
}

// For validation, always assume that there is an intersection
if (shardContext.validate()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we validating the range is in valid DateTimeFormat (can be parsed as such) before bypassing the validation to the other part of the query with INTERSECT? I could be wrong, but seems like the dateMathParser below does some validation for the date format here. In case of invalid range, the query should be marked invalid. Couldn't find a test case for this scenario or maybe I missed? Just want to make sure we are not ignoring the case where timestamp is invalid.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@setiah thanks for looking, yeah, it does not validate but probably should, updated the pull request accordingly, thanks!

Copy link
Contributor

@setiah setiah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggested solution for validation flow is to not apply the optimization for date properties but always assume there is a match

This makes sense. Just have one small comment (question) and thanks for the changes!

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success e4d5611
Log 3268

Reports 3268

Copy link
Contributor

@setiah setiah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @reta, LGTM!

@setiah setiah merged commit 5c0f9bc into opensearch-project:main Mar 14, 2022
@opensearch-trigger-bot
Copy link
Contributor

The backport to 1.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-1.x 1.x
# Navigate to the new working tree
cd .worktrees/backport-1.x
# Create a new branch
git switch --create backport/backport-2416-to-1.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 5c0f9bc499c5c4a744d2f29fb5bd9eab4aabc004
# Push it to GitHub
git push --set-upstream origin backport/backport-2416-to-1.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-1.x

Then, create a pull request where the base branch is 1.x and the compare/head branch is backport/backport-2416-to-1.x.

reta added a commit to reta/OpenSearch that referenced this pull request Mar 15, 2022
…ity (opensearch-project#2416)

* Discrepancy in result from _validate/query API and actual query validity

Signed-off-by: Andriy Redko <[email protected]>

* Moved the validate() check later into the flow to allow range validation to trigger first

Signed-off-by: Andriy Redko <[email protected]>
reta added a commit to reta/OpenSearch that referenced this pull request Mar 15, 2022
…ity (opensearch-project#2416)

* Discrepancy in result from _validate/query API and actual query validity

Signed-off-by: Andriy Redko <[email protected]>

* Moved the validate() check later into the flow to allow range validation to trigger first

Signed-off-by: Andriy Redko <[email protected]>
dblock pushed a commit that referenced this pull request Mar 15, 2022
…ity (#2416) (#2466)

* Discrepancy in result from _validate/query API and actual query validity

Signed-off-by: Andriy Redko <[email protected]>

* Moved the validate() check later into the flow to allow range validation to trigger first

Signed-off-by: Andriy Redko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Discrepancy in result from _validate/query API and actual query validity
4 participants