-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to configure default analysis types in SegmentMetadataQuery #4259
Conversation
|
||
@JsonProperty | ||
private Period defaultHistory = ISO_FORMATTER.parsePeriod(DEFAULT_PERIOD_STRING); | ||
|
||
@JsonProperty | ||
private EnumSet<SegmentMetadataQuery.AnalysisType> defaultAnalysisType = DEFAULT_ANALYSIS_TYPES; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested defaultAnalysisTypes (update getter and setter names too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think with the way this is currently written, all nodes involved in a query (broker, historical, etc) will apply their own defaultAnalysisTypes, meaning we'll get weird behavior if different nodes have different values. To fix that, the broker should rewrite the query to include explicit analysis types, to prevent the historicals/etc from filling in their own (possibly different) set. If that's already happening somewhere then I missed it.
One place that could be done is in SegmentMetadataQueryQueryToolChest.mergeResults
, it gets called pretty early in the query pipeline and can do rewrites like that. If you make this change then you should be able to revert the cache strategy back to the old code, since the caching layer won't activate until after the query is rewritten, and it can assume analysisTypes is set to an explicit value. As a sanity check you could have the cache strategy throw some kind of IllegalArgumentException if it's called on a query where analysisTypes is still null.
public EnumSet<SegmentMetadataQuery.AnalysisType> getAnalysisTypes(SegmentMetadataQuery query) | ||
{ | ||
if (query.getAnalysisTypes() == null) { | ||
return config != null ? config.getDefaultAnalysisTypes() : SegmentMetadataQueryConfig.DEFAULT_ANALYSIS_TYPES; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like handling null config shouldn't be necessary here. It shouldn't be null in production. If it is null sometimes in testing, how about replacing those spots with new SegmentMetadataQueryConfig()
to get a default config? Then, SegmentMetadataQueryConfig.DEFAULT_ANALYSIS_TYPES
could be private.
Given that the config is specific to the broker's It has been reimplemented to not be reliant on the query for analysisTypes in |
docs/content/configuration/broker.md
Outdated
@@ -86,6 +86,7 @@ See [groupBy server configuration](../querying/groupbyquery.html#server-configur | |||
|Property|Description|Default| | |||
|--------|-----------|-------| | |||
|`druid.query.segmentMetadata.defaultHistory`|When no interval is specified in the query, use a default interval of defaultHistory before the end time of the most recent segment, specified in ISO8601 format. This property also controls the duration of the default interval used by GET /druid/v2/datasources/{dataSourceName} interactions for retrieving datasource dimensions/metrics.|P1W| | |||
|`druid.query.segmentMetadata.defaultAnalysisTypes`|This can be used to set the Default Analysis Types for all segment metadata queries, this can be overridden when making the query|[CARDINALITY, INTERVAL, MINMAX]| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value here should be updated to ["cardinality", "interval", "minmax"]
. Also please point to this config in docs of Segment Metadata Queries, line 35.
@fjy this PR actually blocks us from updating to 0.10.0 (even not 0.10.1), so it fits the criteria that I suggested before, unlike many other PRs that currently target 0.10.1, and don't block anybody from updating to 0.10.0. |
I think @gianm is proposing that the broker overwrite the analysis types if null is passed in, and if someone sets |
Yeah, I meant what @drcrallen said. In particular I'm suggesting that if the broker receives a query without analysisTypes set, it should rewrite the query to include analysisTypes before it passes it down to other nodes. Otherwise different nodes involved in the query might not agree on what the analysisTypes are, which could cause weird results. |
I'm not sure if this was the best way of doing it, but I made the |
@@ -166,12 +169,66 @@ public String getType() | |||
return analysisTypes; | |||
} | |||
|
|||
public void setAnalysisTypes(EnumSet<AnalysisType> analysisTypes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Query object should be immutable. Please replace this method with "withAnalysisTypes()", returning a new query object with the given analysis types. Also update SegmentMetadataQueryBuilder, to include a new field.
@@ -107,12 +108,14 @@ public SegmentMetadataQueryQueryToolChest(SegmentMetadataQueryConfig config, Gen | |||
Map<String, Object> context | |||
) | |||
{ | |||
Query<SegmentAnalysis> query = queryPlus.getQuery(); | |||
SegmentMetadataQuery castedQuery = (SegmentMetadataQuery) queryPlus.getQuery(); | |||
SegmentMetadataQuery updatedQuery =(SegmentMetadataQuery) (castedQuery.withAnalysisTypes(getFinalAnalysisTypes(castedQuery))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
withAnalysisTypes()
could return SegmentMetadataQuery
and casting not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested to join those methods into a single method like SegmentMetadataQuery.withFinalizedAnalysisTypes(toolChest)
, for less boilerplate, because currently they are always used together. Also this method may avoid creating a new query object and return the same instance, if analysisTypes are already set.
|
||
public EnumSet<SegmentMetadataQuery.AnalysisType> getFinalAnalysisTypes(SegmentMetadataQuery query) | ||
{ | ||
if (query.getAnalysisTypes() == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested Objects.firstNonNull()
return new BinaryFn<SegmentAnalysis, SegmentAnalysis, SegmentAnalysis>() | ||
{ | ||
private final SegmentMetadataQuery query = (SegmentMetadataQuery) inQ; | ||
private final SegmentMetadataQuery query = updatedQuery; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this field is needed?
@@ -121,7 +124,10 @@ public SegmentMetadataQueryQueryToolChest(SegmentMetadataQueryConfig config, Gen | |||
@Override | |||
protected Ordering<SegmentAnalysis> makeOrdering(Query<SegmentAnalysis> query) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seem to be no point for this anonymous QueryRunner to extend ResultMergeQueryRunner
rather than BySegmentSkippingQueryRunner
directly, because it overrides doRun()
. Then this QueryRunner doesn't need to follow API of ResultMergeQueryRunner
, it may accept SegmentMetadataQuery
instead of Query<SegmentAnalysis>
in makeOrdering()
and createMergeFn()
, to avoid casting.
@@ -121,7 +124,10 @@ public SegmentMetadataQueryQueryToolChest(SegmentMetadataQueryConfig config, Gen | |||
@Override | |||
protected Ordering<SegmentAnalysis> makeOrdering(Query<SegmentAnalysis> query) | |||
{ | |||
if (((SegmentMetadataQuery) query).isMerge()) { | |||
SegmentMetadataQuery castedQuery = (SegmentMetadataQuery) query; | |||
SegmentMetadataQuery updatedQuery =(SegmentMetadataQuery) (castedQuery.withAnalysisTypes(getFinalAnalysisTypes(castedQuery))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already done in upstream doRun() call.
return this; | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra line
@@ -178,19 +187,21 @@ public SegmentAnalysis apply(SegmentAnalysis arg1, SegmentAnalysis arg2) | |||
@Override | |||
public CacheStrategy<SegmentAnalysis, SegmentAnalysis, SegmentMetadataQuery> getCacheStrategy(final SegmentMetadataQuery query) | |||
{ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra line
return new CacheStrategy<SegmentAnalysis, SegmentAnalysis, SegmentMetadataQuery>() | ||
{ | ||
@Override | ||
public boolean isCacheable(SegmentMetadataQuery query, boolean willMergeRunners) | ||
public boolean isCacheable(SegmentMetadataQuery updatedQuery, boolean willMergeRunners) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't be renamed?
@@ -235,7 +246,8 @@ public SegmentAnalysis apply(@Nullable SegmentAnalysis input) | |||
@Override | |||
public <T extends LogicalSegment> List<T> filterSegments(SegmentMetadataQuery query, List<T> segments) | |||
{ | |||
if (!query.isUsingDefaultInterval()) { | |||
SegmentMetadataQuery updatedQuery = (SegmentMetadataQuery) query.withAnalysisTypes(getFinalAnalysisTypes(query)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pointless, because doesn't affect isUsingDefaultInterval
@@ -962,6 +962,7 @@ public static ResultBuilder newResultBuilder() | |||
private Boolean merge; | |||
private Boolean lenientAggregatorMerge; | |||
private Map<String, Object> context; | |||
private Boolean usingDefaultInterval; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useDefaultInterval seems to be an unnecessary configuration, that allows to create inconsistency, if you pass useDefaultInterval=false, and querySegmentSpec which actually represents the default interval to SegmentMetadataQuery constructor.
I suggest the following plan:
- Don't add usingDefaultInterval to this builder
- Leave usingDefaultInterval parameter of SegmentMetadataQuery for compatibility, but ignore it, and document the fact that it is going to be removed.
- In the constructor, set usingDefaultInterval=true if querySegmentSpec == null or querySegmentSpec is not null, and it has just one interval which is equal to the default interval.
{ | ||
return new BinaryFn<SegmentAnalysis, SegmentAnalysis, SegmentAnalysis>() | ||
{ | ||
private final SegmentMetadataQuery query = (SegmentMetadataQuery) inQ; | ||
private final SegmentMetadataQuery query = inQ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field is not needed
), | ||
MERGE_TRANSFORM_FN | ||
); | ||
} | ||
|
||
@Override | ||
protected Ordering<SegmentAnalysis> makeOrdering(Query<SegmentAnalysis> query) | ||
protected Ordering<SegmentAnalysis> makeOrdering(SegmentMetadataQuery query) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be private
@@ -138,12 +138,11 @@ public int compare( | |||
return query.getResultOrdering(); // No two elements should be equal, so it should never merge | |||
} | |||
|
|||
@Override | |||
protected BinaryFn<SegmentAnalysis, SegmentAnalysis, SegmentAnalysis> createMergeFn(final Query<SegmentAnalysis> inQ) | |||
protected BinaryFn<SegmentAnalysis, SegmentAnalysis, SegmentAnalysis> createMergeFn(final SegmentMetadataQuery inQ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be private
import io.druid.query.spec.MultipleIntervalSegmentSpec; | ||
import io.druid.query.spec.QuerySegmentSpec; | ||
import jdk.nashorn.internal.ir.annotations.Ignore; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use unrelated annotation
@@ -106,7 +102,7 @@ public SegmentMetadataQuery( | |||
@JsonProperty("merge") Boolean merge, | |||
@JsonProperty("context") Map<String, Object> context, | |||
@JsonProperty("analysisTypes") EnumSet<AnalysisType> analysisTypes, | |||
@JsonProperty("usingDefaultInterval") Boolean useDefaultInterval, | |||
@Ignore @JsonProperty("usingDefaultInterval") Boolean useDefaultInterval, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please document that this parameter is ignored in a simple comment, also add note that it is going to be removed, and left now only for the sake of compatibility.
@@ -121,11 +117,17 @@ public SegmentMetadataQuery( | |||
if (querySegmentSpec == null) { | |||
this.usingDefaultInterval = true; | |||
} else { | |||
this.usingDefaultInterval = useDefaultInterval == null ? false : useDefaultInterval; | |||
if (querySegmentSpec.getIntervals().size() == 1 && querySegmentSpec.getIntervals() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing .get(0)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please add a test for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address this. Also test that if the default interval is explicitly specified via query builder (rather than leaving it as null
, that will be replaced with the default interval in the SegmentMetadataQuery
's constructor), than isDefaultInterval is correctly set to true
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I somehow missed .get(0)
on the next line. The formatting is strange: suggested
if (querySegmentSpec.getIntervals().size() == 1 &&
querySegmentSpec.getIntervals().get(0).equals(DEFAULT_INTERVAL)) {
@@ -255,6 +256,14 @@ public boolean hasMinMax() | |||
return Druids.SegmentMetadataQueryBuilder.copy(this).toInclude(includerator).build(); | |||
} | |||
|
|||
public SegmentMetadataQuery withFinalizedAnalysisTypes(SegmentMetadataQueryConfig config) | |||
{ | |||
return Druids.SegmentMetadataQueryBuilder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't create a new query object if analysisTypes are already non-null, just return this
@gkc2104 BTW is this valid that |
@leventov Agreed, it is building a new interval using Default History and the max segment time. |
@@ -1077,6 +1081,33 @@ public Interval getInterval() | |||
for (int i = 0; i < filteredSegments2.size(); i++) { | |||
Assert.assertEquals(expectedSegments2.get(i).getInterval(), filteredSegments2.get(i).getInterval()); | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leventov added the isUsingDefaultInterval
in here along with a failing test (because of filterSegments implementation). What should I do about filterSegments though ?
Updated |
final Interval targetInterval = new Interval(config.getDefaultHistory(), targetEnd); | ||
List<Interval> intervals = query.getIntervals(); | ||
|
||
DateTime queryStartTime = JodaUtils.ETERNITY.getEnd(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other parts of the Druid codebase new DateTime(JodaUtils.MAX_INSTANT)
is used
List<Interval> intervals = query.getIntervals(); | ||
|
||
DateTime queryStartTime = JodaUtils.ETERNITY.getEnd(); | ||
DateTime queryEndTIme = JodaUtils.ETERNITY.getStart(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
queryEndTime
DateTime queryStartTime = JodaUtils.ETERNITY.getEnd(); | ||
DateTime queryEndTIme = JodaUtils.ETERNITY.getStart(); | ||
|
||
for (Interval interval : intervals) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider
queryStartTime = intervals.stream().map(Interval::getEnd)
.max(Ordering.natural()).orElseThrow(IllegalStateException::new);
@fjy restoring 0.10.1 milestone because it's an important change that restores compatibility of segment metadata queries. @gianm @drcrallen could you please review the design again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good except for the defaultInterval stuff. I'm not sure how that got roped into this patch, but I think it's fine the way it is and doesn't need to be changed.
@leventov wrote,
useDefaultInterval seems to be an unnecessary configuration, that allows to create inconsistency, if you pass useDefaultInterval=false, and querySegmentSpec which actually represents the default interval to SegmentMetadataQuery constructor.
I suggest the following plan:
- Don't add usingDefaultInterval to this builder
- Leave usingDefaultInterval parameter of SegmentMetadataQuery for compatibility, but ignore it, and document the fact that it is going to be removed.
- In the constructor, set usingDefaultInterval=true if querySegmentSpec == null or querySegmentSpec is not null, and it has just one interval which is equal to the default interval.
See #1732 (comment) for the original rationale about why this exists. I guess it was confusing, so we should add a comment, but it does serve a purpose.
It definitely shouldn't be something a user should be able to provide in the builder. Its only purpose is to help "remember" whether or not a user provided intervals. The idea is we wanted providing null
and providing the default interval explicitly to have different behavior. null
does druid.query.segmentMetadata.defaultHistory
prior to maxTime; explicitly specifying the interval from DEFAULT_INTERVAL
will do exactly what the user asked for, and use that specific interval, not the one based on druid.query.segmentMetadata.defaultHistory.
@@ -235,18 +236,27 @@ public SegmentAnalysis apply(@Nullable SegmentAnalysis input) | |||
@Override | |||
public <T extends LogicalSegment> List<T> filterSegments(SegmentMetadataQuery query, List<T> segments) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of the changes in this method? They don't seem related to the rest of the patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, these are unrelated to the changes proposed for this patch, I'll open up another PR for this.
The current implementation does not take into consideration the time range while filtering, I added an improvement to only include segments that are within the intervals mentioned by the query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation does not take into consideration the time range while filtering, I added an improvement to only include segments that are within the intervals mentioned by the query.
filterSegments doesn't need to worry about that, since the list of segments it gets has already been pruned down to whatever matches the intervals filter. In fact many implementations just return segments;
with no changes -- check out timeseries for an example. filterSegments only has to potentially filter it down even further, if it wants (see timeBoundary for another example of that). I suppose the javadocs for filterSegments could be more clear on this.
Please also see #4259 (review) for rationale about the current behavior of defaultInterval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was under the impression that this method was called to prune the segments to include only the required ones from a list of all segments, didn't know it was already pruned. Now that I look at the examples you suggested I understand why it is required even if we already prune it.
Reg: defaultInterval
- I had some failing tests that were because of defaultInterval
, I don't remember the exact test case, but reverting to the old behavior did not break any tests (probably the change of not having null configs helped I guess), so we can keep the implementation on master
I excluded these changes from this PR.
@@ -87,7 +87,8 @@ public SegmentMetadataQueryRunnerFactory( | |||
public Sequence<SegmentAnalysis> run(QueryPlus<SegmentAnalysis> inQ, Map<String, Object> responseContext) | |||
{ | |||
SegmentMetadataQuery query = (SegmentMetadataQuery) inQ.getQuery(); | |||
final SegmentAnalyzer analyzer = new SegmentAnalyzer(query.getAnalysisTypes()); | |||
SegmentMetadataQuery updatedQuery = query.withFinalizedAnalysisTypes(toolChest.getConfig()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well do this and the cast in one line to avoid accidentally referencing query
instead of updatedQuery
.
@@ -106,6 +101,7 @@ public SegmentMetadataQuery( | |||
@JsonProperty("merge") Boolean merge, | |||
@JsonProperty("context") Map<String, Object> context, | |||
@JsonProperty("analysisTypes") EnumSet<AnalysisType> analysisTypes, | |||
// useDefaultInterval will be removed, but is left for now for compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's wrong with useDefaultInterval?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Needs one more design review. |
#3773
Now that
defaultAnalysisTypes
is being updated, it would make it easier for users to update to0.10.0
if they could easily revert to old behavior without having to update all SegmentMetadataQueries they use.Suggest not only providing an option to revert to old behavior but also be able to specify their own defaultAnalysisTypes using
druid.query.segmentMetadata.defaultAnalysisType
.