Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding filters for TimeBoundary on backend #3168

Merged
merged 10 commits into from
Aug 15, 2016

Conversation

rajk-tetration
Copy link
Contributor

@rajk-tetration rajk-tetration commented Jun 20, 2016

This adds support for adding filters in TimeBoundaryQuerys.

Questions:

  1. In TimeBoundaryQueryRunnerFactory.java, did I add the functionality in a reasonable way?
  2. In TimeBoundaryQueryRunnerTest.java, it isn't apparent that the filtering is working, since the current filter matches for all rows. I can't figure out how the dataSource() function works or how to find a list of dataSources and their associated logfiles, help on that would be appreciated as well.
  3. What other changes do I need to make to finalize this? (I'm guessing there's some stuff to do with the JSON queries)

Thanks.

@fjy
Copy link
Contributor

fjy commented Jun 21, 2016

@rajk-tetration Thanks for this contribution. We'll try to get to reviewing it soon.

@drcrallen
Copy link
Contributor

Tagging for 0.9.2 to make sure it doesn't get lost in the noise

@gianm
Copy link
Contributor

gianm commented Jun 24, 2016

@rajk-tetration, I haven't looked in detail yet, but I bet you'll also have to adjust filterSegments in TimeBoundaryQueryQueryToolChest (and add a test for the case where there is a filter).

For the test I see you had a comment like "no rows matched the filter--what is expected behavior in that case?". Probably the same as whatever an unfiltered timeBoundary would do if there were rows at all (I think that's returning an empty array, but double-check).

};

if (legacyQuery.getDimensionsFilter() != null) {
/* Should make a function? The only difference between these two parts is the descending boolean. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I think making this a function would be good

@jon-wei
Copy link
Contributor

jon-wei commented Jun 24, 2016

@rajk-tetration

1.) I think the TimeBoundaryQueryRunnerFactory implementation is reasonable

2.) For the QueryRunnerTestHelper, the datasource isn't actually used if you call

public static <T, QueryType extends Query<T>> List<QueryRunner<T>> makeQueryRunners(
      QueryRunnerFactory<T, QueryType> factory
  )

that function uses a hard-coded set of segments.

For testing, MultiSegmentSelectQueryTest may be a useful reference.

The default druid.sample.tsv is problematic for testing this filtering behavior as you pointed out; MultiSegmentSelectQueryTest creates its own segments so you'll have control over the timestamps.

The multi segment structure may also be useful for testing the filterSegments behavior that @gianm mentioned.

  1. I think a change needed to finalize is to implement/verify that the filtering works on multiple segments.

@rajk-tetration
Copy link
Contributor Author

rajk-tetration commented Jun 28, 2016

Thanks for the comments.

@gianm I just deleted the filterSegments override and let it inherit; my reasoning was that there's no way to tell which segments are needed without the actual filter (I think similar to the TimeSeries case) - is this correct?

@jon-wei Thanks, that was really helpful, I was able to use it to do testing on multiple segments.

@gianm
Copy link
Contributor

gianm commented Jun 28, 2016

We really do want to filter the segments down when there is no filter - that's the common use case for timeBoundary and the optimization can make a huge difference.

@rajk-tetration
Copy link
Contributor Author

@gianm Right, that makes sense. Updated.

@Override
public Result<DateTime> apply(Cursor cursor)
{
if (cursor.isDone()) { return null; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting issue

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, should return null be on a separate line?

private final String bound;

@JsonCreator
public TimeBoundaryQuery(
@JsonProperty("dataSource") DataSource dataSource,
@JsonProperty("intervals") QuerySegmentSpec querySegmentSpec,
@JsonProperty("bound") String bound,
@JsonProperty("filter") DimFilter dimFilter,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires updating the timeboundary query documentation in docs/content/querying

@fjy
Copy link
Contributor

fjy commented Jul 7, 2016

👍 after docs updated.

@gianm
Copy link
Contributor

gianm commented Jul 7, 2016

@rajk-tetration TimeBoundaryQuery.getCacheKey needs to be updated too.

@@ -9,6 +9,7 @@ Time boundary queries return the earliest and latest data points of a data set.
"queryType" : "timeBoundary",
"dataSource": "sample_datasource",
"bound" : < "maxTime" | "minTime" > # optional, defaults to returning both timestamps if not set
"filter" : { "type": "and", "fields": [<filter>, <filter>, ...] } # optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this is an "and" filter rather than just <filter>?

Copy link
Contributor Author

@rajk-tetration rajk-tetration Jul 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find an example of just using "filter": <filter> anywhere else in the docs, for example in TimeSeries an actual filter is given. I can change it, just wanted to be consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, fair enough, I guess this style predated your edit so feel free to leave it that way.

@rajk-tetration
Copy link
Contributor Author

rajk-tetration commented Jul 7, 2016

@gianm I'm not really sure what getCacheKey is, is it just a unique identifier for this query? Should I serialize the filter and add it to the buffer?

@ghost
Copy link

ghost commented Jul 29, 2016

@gianm - do you have any advice/references on implementing getCacheKey?

@gianm
Copy link
Contributor

gianm commented Jul 29, 2016

@raj-kesavan sorry for the delay, I missed your first message…

getCacheKey is used for the query cache, which is per-query, per-segment. It should include anything that might affect the query result on a per segment basis. Filters definitely should be included. You could check out TimeseriesQueryQueryToolChests -> getCacheStrategy -> computeCacheKey for an example of how to do that.

@fjy fjy changed the title [Discuss] Adding filters for TimeBoundary on backend Adding filters for TimeBoundary on backend Aug 3, 2016
@fjy fjy removed the Discuss label Aug 3, 2016
@fjy
Copy link
Contributor

fjy commented Aug 3, 2016

👍

@drcrallen
Copy link
Contributor

drcrallen commented Aug 5, 2016

I'm really confused here, is this a PR from @raj-kesavan or @rajk-tetration ?

((edit: had double the same name))

@drcrallen
Copy link
Contributor

((clarification is important for authorship and CLA guarantees)

{
return false;
}
public boolean hasFilters() { return dimFilter != null; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting is off here. You can use the IntelliJ/Eclipse style bundles in https://github.com/druid-io/druid/blob/master/CONTRIBUTING.md to format the code automatically.

@gianm
Copy link
Contributor

gianm commented Aug 5, 2016

@rajk-tetration looking good, just a few final comments from me!

@drcrallen
Copy link
Contributor

@rajk-tetration Thanks for the contribution! Would you mind filling out the CLA at http://druid.io/community/cla.html please?

@rajk-tetration
Copy link
Contributor Author

rajk-tetration commented Aug 5, 2016

@gianm Does this look okay? I was able to import the xml file into IntelliJ but pressing "reformat code" never does anything.

@drcrallen Will do, thanks

@gianm
Copy link
Contributor

gianm commented Aug 5, 2016

@rajk-tetration, interesting, I wonder if you need to choose "Druid code style" in your IntelliJ code style settings? Anyway the formatting does look better now.

👍 from me after travis + CLA are sorted.

@fjy
Copy link
Contributor

fjy commented Aug 9, 2016

@rajk-tetration did you get a chance to sign the CLA?

@fjy
Copy link
Contributor

fjy commented Aug 11, 2016

@rajk-tetration

@ghost
Copy link

ghost commented Aug 13, 2016

@fjy did you receive the CLA by e-mail?

@fjy
Copy link
Contributor

fjy commented Aug 15, 2016

Got CLA. We are good 👍

@fjy fjy merged commit 362b926 into apache:master Aug 15, 2016
seoeun25 pushed a commit to seoeun25/incubator-druid that referenced this pull request Feb 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants