Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema matching fails on query with negated varstring and clpstring filters on same key #254

Closed
gibber9809 opened this issue Jan 28, 2024 · 2 comments · Fixed by #263
Closed
Assignees
Labels
bug Something isn't working

Comments

@gibber9809
Copy link
Contributor

gibber9809 commented Jan 28, 2024

Bug

For the following data

{"a": "clp string"}
{"a": "string clp"}

The query NOT a: "clp string" will return the second record, but the query NOT (a: "clp string" OR a: "b") will fail schema matching and return no records.

The AST for the failing case is AndExpr(!FilterExpr(EQ, ColumnDescriptor<clpstring,array>("a"), "clp string"), !FilterExpr(EQ, ColumnDescriptor<varstring,array>("a"), "b"))

The problem is that since the dataset contains no varstring type column the second filter fails column matching, gets replaced with EmptyExpr, and the entire AndExpr gets constant propagated away. This bug is a bit annoying, because while the varstring column "a" doesn't exist the JSON string column "a" does exist.

The expected behaviour is to treat the filter as False on the clpstring column "a" when it does exist (and the inverted filter as true).

Actually, for every string that matches either strictly varstring or clpstring there should be an implicit negated condition on the existence of the other type.

For example a<clpstring> NEQ "a b" -> a<clpstring> NEQ "a b" OR EXISTS a<varstring>
and similarly a<varstring> NEQ "b" -> a<varstring> NEQ "b" OR EXISTS a<clpstring>.

This edge case only applies for negated conditions -- something that does not match a particular clpstring will not match every varstring, but something that does match a particular clpstring will never match a varstring.

Easiest way to handle this edge case is probably by either augmenting the AST during either ConvertToExists or in another pass, or by very careful treatment during Schema Matching.

CLP version

9e6b755

Environment

Ubuntu focal docker image

Reproduction steps

  1. Ingest example data above
  2. Run example query
@gibber9809 gibber9809 added the bug Something isn't working label Jan 28, 2024
@gibber9809
Copy link
Contributor Author

Technically the bug is in NarrowTypes because its at that point that we throw away the possibility of matching other types (can match any string type -> can match one specific string type), so maybe the fix should be to augment the AST inside of that pass.

@gibber9809
Copy link
Contributor Author

Also note that this bug does apply to wildcard columns (NOT *:"b" will fail for the same column matching reason mentioned above), but does not apply to filters on arrays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant