You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The query NOT a: "clp string" will return the second record, but the query NOT (a: "clp string" OR a: "b") will fail schema matching and return no records.
The AST for the failing case is AndExpr(!FilterExpr(EQ, ColumnDescriptor<clpstring,array>("a"), "clp string"), !FilterExpr(EQ, ColumnDescriptor<varstring,array>("a"), "b"))
The problem is that since the dataset contains no varstring type column the second filter fails column matching, gets replaced with EmptyExpr, and the entire AndExpr gets constant propagated away. This bug is a bit annoying, because while the varstring column "a" doesn't exist the JSON string column "a" does exist.
The expected behaviour is to treat the filter as False on the clpstring column "a" when it does exist (and the inverted filter as true).
Actually, for every string that matches either strictly varstring or clpstring there should be an implicit negated condition on the existence of the other type.
For example a<clpstring> NEQ "a b" -> a<clpstring> NEQ "a b" OR EXISTS a<varstring>
and similarly a<varstring> NEQ "b" -> a<varstring> NEQ "b" OR EXISTS a<clpstring>.
This edge case only applies for negated conditions -- something that does not match a particular clpstring will not match every varstring, but something that does match a particular clpstring will never match a varstring.
Easiest way to handle this edge case is probably by either augmenting the AST during either ConvertToExists or in another pass, or by very careful treatment during Schema Matching.
Technically the bug is in NarrowTypes because its at that point that we throw away the possibility of matching other types (can match any string type -> can match one specific string type), so maybe the fix should be to augment the AST inside of that pass.
Also note that this bug does apply to wildcard columns (NOT *:"b" will fail for the same column matching reason mentioned above), but does not apply to filters on arrays.
Bug
For the following data
The query
NOT a: "clp string"
will return the second record, but the queryNOT (a: "clp string" OR a: "b")
will fail schema matching and return no records.The AST for the failing case is
AndExpr(!FilterExpr(EQ, ColumnDescriptor<clpstring,array>("a"), "clp string"), !FilterExpr(EQ, ColumnDescriptor<varstring,array>("a"), "b"))
The problem is that since the dataset contains no
varstring
type column the second filter fails column matching, gets replaced withEmptyExpr
, and the entireAndExpr
gets constant propagated away. This bug is a bit annoying, because while thevarstring
column "a" doesn't exist the JSON string column "a" does exist.The expected behaviour is to treat the filter as
False
on theclpstring
column "a" when it does exist (and the inverted filter as true).Actually, for every string that matches either strictly
varstring
orclpstring
there should be an implicit negated condition on the existence of the other type.For example
a<clpstring> NEQ "a b"
->a<clpstring> NEQ "a b" OR EXISTS a<varstring>
and similarly
a<varstring> NEQ "b"
->a<varstring> NEQ "b" OR EXISTS a<clpstring>
.This edge case only applies for negated conditions -- something that does not match a particular
clpstring
will not match everyvarstring
, but something that does match a particular clpstring will never match avarstring
.Easiest way to handle this edge case is probably by either augmenting the AST during either ConvertToExists or in another pass, or by very careful treatment during Schema Matching.
CLP version
9e6b755
Environment
Ubuntu focal docker image
Reproduction steps
The text was updated successfully, but these errors were encountered: