-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an option to SearchQuery to choose a search query execution strategy #3792
Conversation
…egy. Supported strategies are 1) Index-only query execution 2) Cursor-based scan 3) Auto: choose an efficient strategy for a given query
Here are some JMH benchmark results ( Operation time with varying filter selectivity
Operation time with varying search dimension's cardinality |
@@ -71,6 +73,12 @@ | |||
public class SearchQueryRunner implements QueryRunner<Result<SearchResultValue>> | |||
{ | |||
private static final EmittingLogger log = new EmittingLogger(SearchQueryRunner.class); | |||
|
|||
private static final double HIGH_FILTER_SELECTIVITY_THRESHOLD_FOR_CONCISE = 0.99; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can the strategies themselves return these constants? This way if we add additional bitmap algorithms we can just add new strategies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually reading more over this code, I think this might belong in the bitmap factories
@gianm thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these should be in the bitmap factories, since the thresholds are experimentally determined and specific to the Search Query algorithm. I doubt anything else would find them useful, so IMO, they belong in the Search Query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added SearchQueryDecisionHelper which provides such information.
} | ||
if (retVal.size() >= limit) { | ||
return makeReturnResult(limit, retVal); | ||
switch (strategy) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of having a switch statement here, can we instead have the strategy return the execution strategy?
return strategy.getExecutionPlan();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -49,6 +49,14 @@ | |||
private final List<DimensionSpec> dimensions; | |||
private final SearchQuerySpec querySpec; | |||
private final int limit; | |||
private final Strategy strategy; | |||
|
|||
public enum Strategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if SearchStrategy should be an interface that we extend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
throw new ISE( | ||
"Null storage adapter found. Probably trying to issue a query against a segment being memory unmapped." | ||
); | ||
private static boolean isLowCardinality(final BitmapFactory bitmapFactory, final long totalCard) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should have a method on bitmapFactory that returns if its a low or high cardinality bitmap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to SearchQueryDecisionHelper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree, I think this method works better in the SearchQuery since the meaning of "low" and "high" is very subjective and, in this case, determined experimentally for a specific query. It's possible that for a different query, different thresholds would be needed.
final Iterable<DimensionSpec> dimsToSearch; | ||
if (dimensions == null || dimensions.isEmpty()) { | ||
dimsToSearch = Iterables.transform(adapter.getAvailableDimensions(), Druids.DIMENSION_IDENTITY); | ||
private static boolean isHighSelectivityFilter(final QueryableIndex index, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, the bitmap should tell us this information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to SearchQueryDecisionHelper
@@ -254,6 +275,10 @@ public boolean equals(Object o) | |||
return false; | |||
} | |||
|
|||
if (!strategy.equals(that.strategy)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about hashcode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fjy thank you for your review. I updated my patch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a partial review since I started before @jihoonson's recent refactoring. Will review the new version again.
@@ -71,6 +73,12 @@ | |||
public class SearchQueryRunner implements QueryRunner<Result<SearchResultValue>> | |||
{ | |||
private static final EmittingLogger log = new EmittingLogger(SearchQueryRunner.class); | |||
|
|||
private static final double HIGH_FILTER_SELECTIVITY_THRESHOLD_FOR_CONCISE = 0.99; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these should be in the bitmap factories, since the thresholds are experimentally determined and specific to the Search Query algorithm. I doubt anything else would find them useful, so IMO, they belong in the Search Query.
@@ -94,6 +102,7 @@ public SearchQueryRunner(Segment segment) | |||
final SearchQuerySpec searchQuerySpec = query.getQuery(); | |||
final int limit = query.getLimit(); | |||
final boolean descending = query.isDescending(); | |||
final Strategy strategy = query.getStrategy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually we do tunings like this as "context" flags rather than explicitly making objects for them. So I'd do something similar here. The code would be like:
final Strategy strategy = Strategy.valueOf(query.getContextValue(CTX_KEY_STRATEGY, DEFAULT_STRATEGY).toUpperCase());
Where CTX_KEY_STRATEGY and DEFAULT_STRATEGY are string constants that have values like "searchQueryStrategy" and "auto". See the groupBy "Query context" docs for a similar example. http://druid.io/docs/latest/querying/groupbyquery.html#query-context
final int limit, | ||
final BitmapFactory bitmapFactory) | ||
{ | ||
log.info("Index only query execution strategy is selected."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't log anything at info
level during a query, it will be way too many logs.
throw new ISE( | ||
"Null storage adapter found. Probably trying to issue a query against a segment being memory unmapped." | ||
); | ||
private static boolean isLowCardinality(final BitmapFactory bitmapFactory, final long totalCard) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree, I think this method works better in the SearchQuery since the meaning of "low" and "high" is very subjective and, in this case, determined experimentally for a specific query. It's possible that for a different query, different thresholds would be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good so far, left a few comments. thanks @jihoonson.
for (int i = startIndex; i <= endIndex; i++) { | ||
timeBitmap.add(i); | ||
} | ||
final SearchStrategy strategy = query.getStrategy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually we do tunings like this as "context" flags rather than explicitly making objects for them. So I'd do something similar here. The code would be like:
final SearchStrategy strategy = SearchStrategy.fromString(query.getContextValue(CTX_KEY_STRATEGY, DEFAULT_STRATEGY));
Where CTX_KEY_STRATEGY and DEFAULT_STRATEGY are string constants that have values like "searchQueryStrategy" and "auto", and SearchStrategy.fromString is something like QueryGranularity.fromString.
To me the main reason to prefer context parameters is that it allows users to add tunings from newer versions of Druid without their client needing to be aware of a new query field. Almost all Druid clients allow users to add arbitrary context parameters but not all of them allow arbitrary top-level query parameters.
See also GroupByStrategySelector and the groupBy "Query context" http://druid.io/docs/latest/querying/groupbyquery.html#query-context for how strategy selection & tuning is handled in GroupBy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I'll change.
if (filter == null || | ||
index.getDecisionHelper().hasLowCardinality(index, dimsToSearch) || | ||
index.getDecisionHelper().hasHighSelectivity(index, timeFilteredBitmap)) { | ||
log.info("Index-only execution strategy is selected"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't log anything at info
level during query execution, there will be too many logs. debug
or trace
is preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Metadata getMetadata(); | ||
Map<String, DimensionHandler> getDimensionHandlers(); | ||
|
||
SearchQueryDecisionHelper getDecisionHelper(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this decision-making should be self-contained within the Search query rather than bleeding into the QueryableIndex interface.
My rationale is that the thresholds embodied in this decision helper are experimentally determined and specific to the Search query algorithm, and ideally nothing else should know about them. Moving this to the Search query does mean that the Search query will need to know about different kinds of bitmaps, but I think that's better than the QueryableIndex knowing about the Search query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. How about making DecisionHelper based on index type in SearchStrategy? DecisionHelper can be used in only SearchStrategy as well as separating from index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good to me.
// Index-only strategy is selected when | ||
// 1) there is no filter, | ||
// 2) the total cardinality is very low, or | ||
// 3) the filter has a very high selectivity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the filter is highly selective, doesn't that mean we want to use the cursor-based strategy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think we're using different definitions of "high selectivity". I think of it the same way as this guy: https://blogs.msdn.microsoft.com/bartd/2011/01/25/query-tuning-fundamentals-density-predicates-selectivity-and-cardinality/
Selectivity for a filter predicate against a base table can be calculated as “[# rows that pass the predicate]/[# rows in the table]”. If the predicate passes all rows in the table, its selectivity is 1.0. If it disqualifies all rows, its selectivity is 0. (This can be confusing. Note that 0.000001 reflects a high selectivity even though the number is small, while 1.0 is low selectivity even though the number is higher.)
So > .99 selectivity would be "very low" if the two of us are thinking about it the right way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I was confused. Thanks.
throw new IAE("Should only have one interval, got[%s]", intervals); | ||
} | ||
final Interval interval = intervals.get(0); | ||
final ImmutableBitmap timeFilteredBitmap = IndexOnlyExecutor.makeTimeFilteredBitmap(index, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a couple of performance issues here:
makeTimeFilteredBitmap
isn't cheap, and if we use the index-based execution we will end up calling it twice.filter.getBitmapIndex
isn't cheap either, and if we end up doing the cursor-based execution, it'll get called twice too (once by makeTimeFilteredBitmap and once by makeCursors).
Some possible strategies to help:
- Don't generate the timeFilteredBitmap until we actually need to call
hasHighSelectivity
. i.e. if the search dims are low cardinality, just go straight to the index-only strategy. And if they're very high cardinality, similar to the number of rows in the segment, then possibly go straight to the cursor-only strategy (please do some benchmarks to verify this guess though). We should only need to check the filter selectivity if search dimension cardinality is medium. - Save the timeFilteredBitmap after generating it, and pass it to the index-based executor if we choose that one.
I think one thing we should not do is save the bitmap and pass it to makeCursors. That's probably too leaky and I would rather accept the performance hit there.
static Iterable<DimensionSpec> getDimsToSearch(Indexed<String> availableDimensions, List<DimensionSpec> dimensions) | ||
{ | ||
if (dimensions == null || dimensions.isEmpty()) { | ||
return Iterables.transform(availableDimensions, Druids.DIMENSION_IDENTITY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is lazily computed, to save on allocations you can instead wrap it in ImmutableList.copyOf
, like
return ImmutableList.copyOf(Iterables.transform(availableDimensions, Druids.DIMENSION_IDENTITY));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -171,6 +159,39 @@ private void setupQueries() | |||
basicQueries.put("A", queryBuilderA); | |||
} | |||
|
|||
{ // basic.B |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please attach results of this benchmark before and after the patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add.
@JsonSubTypes.Type(name = "indexOnly", value = IndexOnlyStrategy.class), | ||
@JsonSubTypes.Type(name = "cursorBased", value = CursorBasedStrategy.class) | ||
}) | ||
public abstract class SearchStrategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Jackson annotations here are unnecessary if this is changed to be a context parameter.
@fjy could you chime in about a whether you agree with having the strategy be a context parameter (#3792 (comment)) and whether you agree that the search query should self-contain its own decision making with respect to the bitmap types (#3792 (comment))? |
@gianm agree |
Addressed comments. I'll fix the conflicts and add benchmark results soon. |
…h-query-strategy
Here is a result of simple benchmark.
With concise bitmap
With roaring bitmap
You can also see the comparison graph here.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jihoonson. Other than the line comments I've got these two general ones:
- please add docs for the new context flags to searchquery.md (see groupbyquery.md for an example)
- please apply Druid code style to all files as mentioned in https://github.com/druid-io/druid/blob/master/CONTRIBUTING.md
|
||
switch (strategyString) { | ||
case AutoStrategy.NAME: | ||
log.debug("Auto strategy is selected"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to include the query ID in these log messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
if (filter == null || | ||
helper.hasLowCardinality(index, dimsToSearch) || | ||
helper.hasLowSelectivity(index, timeFilteredBitmap)) { | ||
log.debug("Index-only execution strategy is selected"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to include the query id in this log message too, from query.getId()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
|
||
public class IndexOnlyStrategy extends SearchStrategy | ||
{ | ||
public static final String NAME = "indexOnly"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"indexOnly" isn't really the right name for this, since we might not always use the index. How about naming the three strategies "cursorOnly", "useIndexes", and "auto"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I changed names.
@JsonProperty | ||
@Min(1) | ||
private int maxSearchLimit = 1000; | ||
|
||
@JsonProperty | ||
private String searchStrategy = AutoStrategy.NAME; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This made me wonder if auto is a good default. TLDR: I think default to auto is fine.
Longer answer: It looks to me like there will be no necessary performance degradation from the old code by defaulting to "auto". There could be some degradation for some queries if we choose cursor-based when index-based would actually be better, but I'm willing to take that risk in order to have #3775 fixed with the default config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Let's make index-only strategy default. We can change later if auto strategy becomes sufficiently smart and efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think that's fine, in that case I'll tag this with "release notes" so we can let people know in the notes that there's a new option they can turn on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After improving auto
strategy by avoiding computing timeFilteredBitmap
, I think we can use it as the default search strategy. It shows nearly optimal performance. Please refer to #3792 (comment).
@@ -0,0 +1,12 @@ | |||
package io.druid.query.search.search; | |||
|
|||
public class ConciseBitmapDecisionHelper extends SearchQueryDecisionHelper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's make this a singleton.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
import io.druid.collections.bitmap.BitmapFactory; | ||
import io.druid.collections.bitmap.ConciseBitmapFactory; | ||
import io.druid.collections.bitmap.ImmutableBitmap; | ||
import io.druid.collections.bitmap.WrappedImmutableConciseBitmap; | ||
import it.uniroma3.mat.extendedset.intset.ImmutableConciseSet; | ||
|
||
import java.nio.ByteBuffer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or this file.
QueryRunnerFactory factory = new SearchQueryRunnerFactory(new SearchQueryQueryToolChest( | ||
new SearchQueryConfig(), | ||
final SearchQueryConfig config = new SearchQueryConfig(); | ||
QueryRunnerFactory factory = new SearchQueryRunnerFactory( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This supplier can be Suppliers.ofInstance(config)
.
// Index-only strategy is selected when | ||
// 1) there is no filter, | ||
// 2) the total cardinality is very low, or | ||
// 3) the filter has a very low selectivity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I wonder if we can do a better heuristic here. Instead of checking for "low cardinality" and "low selectivity" does it make sense to use cost functions like:
- cursor-based is
numRowsInSegment * filterSelectivity * costOfCheckingSingleValue
- index-based with a filter is
dimensionCardinality * (costOfCheckingSingleValue + searchQuerySelectivity * costOfFilterIntersection)
- index-based without a filter is
dimensionCardinality * costOfCheckingSingleValue
And choose the strategy with the lowest cost. Unfortunately we don't know what searchQuerySelectivity is in advance, so I think we'd have to just use some constant.
One behavior difference is that if dimensionCardinality is equal to numRowsInSegment, like a unique key, then cost functions like that would choose the cursor-based strategy regardless of filter selectivity (since dimensionCardinality >= numRowsInSegment * filterSelectivity
for all values of filterSelectivity). The current code would still use the index-based approach if filter selectivity is lower than the threshold. I'd guess that using cursor-based is actually better, though.
We can also skip computing timeFilteredBitmap if cursor-based already beats index-based (or comes close) before applying the * filterSelectivity
term. Based on your benchmarks, it looks like this can save substantial time.
@jihoonson what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to doing either the cost based approach or your current approach, so feel free to try to convince me either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the good suggestion. I agree that cost-based approach is cool. Here are some issues what I'm considering.
- Disk access patterns of cursor-based and index-only strategies look different. The disk access cost can be ignored like you said above only when they are almost same, but I'm not sure that they are.
- When using cursor-based strategy, some filters can be pushed down into the cursor (preFilters), thereby actually skipping reading non-result data. However, it is difficult to figure out which part of filters are preFilters outside
QueryableIndexStorageAdapter.makeCursors()
. To do so, some refactoring is needed. This also makes implementing cost-based approach difficult.
I think we can simply ignore these details at first, and improve later. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, ignoring this for now sounds good.
Although your comment makes me realize that SearchQueryRunner has a bug right now… it calls filter.getBitmapIndex
without first calling filter.supportsBitmapIndex
to see if that call is valid. The Filter interface doesn't say this (it probably should) but it is bad to call getBitmapIndex if supportsBitmapIndex is false. Callers should use makeMatcher instead.
We should fix that; do you prefer to do it as part of this patch of a follow up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, since we are ignoring this for now, could you please write up a github issue about how we could make this smarter in the future? That way, if someone ever complains about the smartness, we can point them to that issue and ask them to help :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is equivalent now, but only by luck. We are planning to add some filters in the future that don't ever support bitmaps (like filters that filter based on expressions). Those would return false from "supportsBitmapIndex" even if the underlying dimensions did have indexes, since they'd always want to operate on a row by row basis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, filter.getBitmapIndex()
is called for the query-level filter, which might be on a different column than the dimension being search, so even with the current set of filters, things will not work properly. For example try filtering on a long column and searching on a different, string column with a single search query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I missed this is possible. I'll fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And for timeFilteredBitmap
, making it is expensive because it requires a lot of unions of bitmaps for filters.
I have a plan to avoid unions by estimating filter selectivity. I already have some benchmark result which shows nearly optimal performance. However, this work needs some new codes and refactoring, so I think it would be better to proceed in another issue. I'll make a PR soon.
public class ConciseBitmapDecisionHelper extends SearchQueryDecisionHelper | ||
{ | ||
private static final double LOW_FILTER_SELECTIVITY_THRESHOLD = 0.99; | ||
private static final int LOW_CARDINALITY_THRESHOLD = 5000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please briefly describe, in a comment, where these numbers came from? (if you decide to keep using them rather than the cost based approach)
public class RoaringBitmapDecisionHelper extends SearchQueryDecisionHelper | ||
{ | ||
private static final double LOW_FILTER_SELECTIVITY_THRESHOLD = 0.65; | ||
private static final int LOW_CARDINALITY_THRESHOLD = 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please briefly describe, in a comment, where these numbers came from? (if you decide to keep using them rather than the cost based approach)
@gianm, thank you for the review. I updated the patch. |
@@ -0,0 +1,60 @@ | |||
package io.druid.query.search.search; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing license header here and in other new files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems you saw an old patch. I added license headers.
👍 from me |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jihoonson - looks good other than some minor comments about docs & tests.
|
||
#### Strategies | ||
|
||
Search queries can be executed using two different strategies. The default strategy is determeined by the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"determined" (spelling)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks.
only the rows which satisfy those filters, thereby saving I/O cost. However, it might be slow with filters of low selectivity. | ||
|
||
- "auto" strategy uses a cost-based planner for choosing an optimal search strategy. It estimates the cost of index-only | ||
and cursor-based execution plans, and chooses the optimal one. Currently, its performance is suboptimal due to the large |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "it is not enabled by default due to the overhead of cost estimation."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
#### Query context | ||
|
||
The following runtime properties apply: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are "query context parameters" not "runtime properties".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
configs[0] = new SearchQueryConfig(); | ||
configs[0].setSearchStrategy(UseIndexesStrategy.NAME); | ||
configs[1] = new SearchQueryConfig(); | ||
configs[1].setSearchStrategy(CursorOnlyStrategy.NAME); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test for AutoStrategy too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
return new UseIndexesStrategy(query, false, timeFilteredBitmap); | ||
} | ||
|
||
private UseIndexesStrategy(SearchQuery query, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this file have codestyle applied? Are you using Eclipse or IntelliJ? It's not the same as what my IntelliJ does when I apply the codestyle.xml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about this problem. I'll apply the codestyle to all changed files.
index | ||
); | ||
|
||
// Index-only plan is used only when any filter is not specified or every filter supports bitmap indexes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be "the filter" instead of "every filter" (there's just one filter). Maybe this is "every" since you were thinking of decomposing ANDs, but that's not happening right now, so it should be "the"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed.
@gianm, thank you for your review. I updated the patch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, 👍
…egy (apache#3792) * Add an option to SearchQuery to choose a search query execution strategy. Supported strategies are 1) Index-only query execution 2) Cursor-based scan 3) Auto: choose an efficient strategy for a given query * Add SearchStrategy and SearchQueryExecutor * Address comments * Rename strategies and set UseIndexesStrategy as the default strategy * Add a cost-based planner for auto strategy * Add document * Fix code style * apply code style * apply comments
This PR is related to #3775
Supported strategies are
This change is![Reviewable](https://camo.githubusercontent.com/1541c4039185914e83657d3683ec25920c672c6c5c7ab4240ee7bff601adec0b/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)