-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add native 'array contains element' filter #15366
add native 'array contains element' filter #15366
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took one pass. LGTM so far. Will run locally today and do a 2nd pass
@@ -413,7 +413,24 @@ public class SqlBenchmark | |||
"SELECT APPROX_COUNT_DISTINCT_BUILTIN(dimZipf) FROM foo", | |||
"SELECT APPROX_COUNT_DISTINCT_DS_HLL(dimZipf) FROM foo", | |||
"SELECT APPROX_COUNT_DISTINCT_DS_HLL_UTF8(dimZipf) FROM foo", | |||
"SELECT APPROX_COUNT_DISTINCT_DS_THETA(dimZipf) FROM foo" | |||
"SELECT APPROX_COUNT_DISTINCT_DS_THETA(dimZipf) FROM foo", | |||
// 32: LATEST aggregator long |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for moving these from expression benchmarks to here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall seems good. Though tbh, the whole matchers and processors classes are a maze of abstractions that I find difficult to navigate through. So I might have missed a thing here and there.
ExpressionType type = ExpressionType.LONG; | ||
ExpressionType type = null; | ||
for (Expr arg : args) { | ||
type = ExpressionTypeConversion.function(type, arg.getOutputType(inspector)); | ||
type = ExpressionTypeConversion.leastRestrictiveType(type, arg.getOutputType(inspector)); | ||
} | ||
return ExpressionType.asArrayType(type); | ||
return type == null ? null : ExpressionTypeFactory.getInstance().ofArray(type); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why was this changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was wrong in some cases and written before leastRestrictiveType
existed, for example if you try to use the array constructor function on an array type input, you end up with
org.apache.druid.segment.column.Types$IncompatibleTypeException: Cannot implicitly cast [LONG] to [ARRAY<STRING>]
which doesn't happen after the change. Will add a test.
@@ -53,7 +53,8 @@ | |||
@JsonSubTypes.Type(name = "equals", value = EqualityFilter.class), | |||
@JsonSubTypes.Type(name = "range", value = RangeFilter.class), | |||
@JsonSubTypes.Type(name = "isfalse", value = IsFalseDimFilter.class), | |||
@JsonSubTypes.Type(name = "istrue", value = IsTrueDimFilter.class) | |||
@JsonSubTypes.Type(name = "istrue", value = IsTrueDimFilter.class), | |||
@JsonSubTypes.Type(name = "arrayContainsElement", value = ArrayContainsElementFilter.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a shorter name such as arrayContains
sounds better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to confuse this native filter with the array_contains function which this filter can be used to implement but doesn't completely implement (see PR description), so I'd prefer to leave it like this to avoid confusion
processing/src/main/java/org/apache/druid/query/filter/ArrayContainsElementFilter.java
Show resolved
Hide resolved
processing/src/main/java/org/apache/druid/query/filter/ArrayContainsElementFilter.java
Show resolved
Hide resolved
} | ||
} | ||
|
||
private static class TypedConstantElementValueMatcherFactory extends EqualityFilter.TypedConstantValueMatcherFactory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some javadocs here would be useful.
this.doublePredicateSupplier = equalityPredicateFactory::makeDoublePredicate; | ||
this.floatPredicateSupplier = equalityPredicateFactory::makeFloatPredicate; | ||
} | ||
this.objectPredicateSupplier = makeObjectPredicateSupplier(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method results in a call stack that refers to typeDetectingArrayPredicateSupplier which is not initialized yet.
makeObjectPredicateSupplier -> makeArrayPredicate -> typeDetectingArrayPredicateSupplier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nothing will get these predicates until after the constructor though, what is the problem you're worried about?
@Nullable ColumnCapabilities columnCapabilities | ||
) | ||
{ | ||
return super.makeArrayProcessor(selector, columnCapabilities); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while it kinda does the right thing right now, it feels too tightly coupled with EqualityFilter. It confused me for some time till I realized that you are passing the predicateMatcherFactory to super class. The way I read it as that elementMatchValue can be compared to input array through "equals" comparison. I think it would be better to just call predicateMatcherFactory.makeArrayProcessor(selector, columnCapabilities);
and in makeComplexProcessor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while it kinda does the right thing right now, it feels too tightly coupled with EqualityFilter
that is the point though, if the filter is matching a scalar column and the match value is a non-null scalar value then this filter should behave identical to equality filter, the ask during review of #15350 was to re-use as much of the equality filter as possible because in certain cases it is the equality filter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, i just removed these overrides since they were pointless
btw how do the benchmark numbers look like if you disable bitmap indexes? |
like with filterTuning? I didn't check because SQL has no way to control filterTuning, I would guess somewhere between the expression filter numbers and index numbers, but can check in a bit |
ok re-ran benchmarks (my laptop a bit busier this time than last time so stuff a bit slower) expression filter:
arrayContainsElement filter (indexes commented out):
arrayContainsElement filter (with indexes):
|
* add native arrayContainsElement filter to use array column element indexes
…mns (apache#15451) Updates ARRAY_OVERLAP to use the same ArrayContainsElement filter added in apache#15366 when filtering ARRAY typed columns so that it can also use indexes like ARRAY_CONTAINS.
* add native arrayContainsElement filter to use array column element indexes
…mns (apache#15451) Updates ARRAY_OVERLAP to use the same ArrayContainsElement filter added in apache#15366 when filtering ARRAY typed columns so that it can also use indexes like ARRAY_CONTAINS.
Description
Alternative mentioned in #15350, which is simplified and only matches a single array element rather than being a full implementation of 'ARRAY_CONTAINS'. Decided to do this one when I realized that multi-value dimensions would be handled incorrectly with the other filter when using array type element matches.
Array element indexes are already hiding inside of array type columns, this PR adds a native
arrayContainsElement
filter so we can actually use them to implement things likeARRAY_CONTAINS
(in this PR) andARRAY_OVERLAP
(not in this PR).No documentation currently added, will do that in a follow-up PR.
As expected, this is a giant perf improvement when using
ARRAY_CONTAINS
on array columns. For the query added in the benchmarks:Before:
After:
This PR has: