Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional native query tests for unnest datasource #13554

Merged
merged 17 commits into from
Jan 25, 2023

Conversation

somu-imply
Copy link
Contributor

@somu-imply somu-imply commented Dec 12, 2022

Adds the tests and changes for native tests for unnest in Scan, TopN, GroupBy etc.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@somu-imply somu-imply marked this pull request as ready for review December 17, 2022 18:55
@clintropolis clintropolis changed the title Native query runners for unnest additional native query tests for unnest datasource Jan 6, 2023
Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding more tests, just some minor comments 🚀

* under the License.
*/

package org.apache.druid.query;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if this was in org.apache.druid.query.topn with all the other topn query tests then wouldn't need change method visibility in PooledTopNAlgorithm

* under the License.
*/

package org.apache.druid.query;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: imo this should live in org.apache.druid.query.groupby

* under the License.
*/

package org.apache.druid.query;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this should live in org.apache.druid.query.scan

Comment on lines 69 to 70
public static final ObjectMapper DEFAULT_MAPPER = TestHelper.makeSmileMapper();
public static final DruidProcessingConfig DEFAULT_PROCESSING_CONFIG = new DruidProcessingConfig()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since these and some other stuff are public in the standard GroupByQueryRunnerTest I wonder if we should just use those instead of redefining here

@Test
public void testGroupByOnUnnestedColumn()
{
// Cannot vectorize due to extraction dimension spec.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this comment isn't true

);

Iterable<ResultRow> results = GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
TestHelper.assertExpectedObjects(expectedResults, results, "hroupBy-on-unnested-column");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
TestHelper.assertExpectedObjects(expectedResults, results, "hroupBy-on-unnested-column");
TestHelper.assertExpectedObjects(expectedResults, results, "groupBy-on-unnested-column");

Iterable<ResultRow> results = GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
TestHelper.assertExpectedObjects(expectedResults, results, "hroupBy-on-unnested-column");
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to also have a test with a virtual column, maybe both mv_to_array and making a composite array from multiple columns with array would be interesting

}

@Test
public void testUnnestRunner()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: testScan?

}

@Test
public void testUnnestRunnerVirtualColumns()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think would also be interesting to have a test on a composite array column using array function to create an array from multiple columns (in addition to mv_to_array test)

}

@Test
public void testTopNStringVirtualColumnUnnest()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar comment about virtual column created from multiple columns using array in addition to mv_to_array test

@somu-imply somu-imply changed the title additional native query tests for unnest datasource Additional native query tests for unnest datasource Jan 10, 2023
Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.

Copy link
Contributor

@imply-cheddar imply-cheddar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to understand better why you had to pipe things down into the QueryRunner. You shouldn't've needed to do that, it makes me think that these tests aren't running anything that actually does the segment mapping function...

Comment on lines 481 to 493
public static <T, QueryType extends Query<T>> List<QueryRunner<T>> makeUnnestQueryRunners(
QueryRunnerFactory<T, QueryType> factory,
String dimensionToUnnest,
String outputColumn,
LinkedHashSet<String> allowSet
)
{
final IncrementalIndex rtIndex = TestIndex.getIncrementalTestIndex();
final IncrementalIndex noRollupRtIndex = TestIndex.getNoRollupIncrementalTestIndex();
final QueryableIndex mMappedTestIndex = TestIndex.getMMappedTestIndex();
final QueryableIndex noRollupMMappedTestIndex = TestIndex.getNoRollupMMappedTestIndex();
final QueryableIndex mergedRealtimeIndex = TestIndex.mergedRealtimeIndex();
final QueryableIndex frontCodedMappedTestIndex = TestIndex.getFrontCodedMMappedTestIndex();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused as to why you have to do this? The Unnest is defined as part of the query, not as part of the query runner, you should be able to just send the query along and it will attach the unnest on the segment via the segment mapper functions, no?

Comment on lines 49 to 55
public UnnestSegment(Segment baseSegment, String dimension, String outputName, LinkedHashSet<String> allowList)
{
this.baseSegment = baseSegment;
this.dimension = dimension;
this.renamedOutputDimension = outputName;
this.allowSet = allowList;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this should actually be required. The segment mapping fns should be doing this as part of query processing.

Comment on lines +344 to +353
makeRow(
query,
"2011-04-01",
"alias",
"automotive",
"rows",
2L,
"idx",
270L
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit, I don't think these are actually wide enough to require each of the arguments being on independent lines. The test would read a lot easier if each of the makeRow() calls was its own line.

Map<String, Object> event = new HashMap<>();
String[] values1 = input.split("\\t");
for (int i = 0; i < dimSpecs.length; i++) {
if (dimSpecs[i] == null || i >= dimSpecs.length) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this CodeQL thing is likely legitimate

@somu-imply
Copy link
Contributor Author

Migrated towards using the segmentMapFunction along with SegmentReference to run the queries. Removed the earlier UnnestSegment part. Added scan, groupBy and topN queries with array and mv_to_array cases for virtual columns along with other regular test cases. Please review

private final boolean vectorize;

@Rule
public ExpectedException expectedException = ExpectedException.none();

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation

Invoking [ExpectedException.none](1) should be avoided because it has been deprecated.
.idea/misc.xml Outdated
Comment on lines 93 to 96
<component name="SuppressKotlinCodeStyleNotification">
<option name="disableForAll" value="true" />
</component>
</project>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to this file appear to still be checked in? We cannot merge while this is part of the PR.

Copy link
Contributor Author

@somu-imply somu-imply Jan 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge #13576 and then I'll merge the master with this branch and take care of this one. The UnnestColumnValueSelectorCursor is updated in that PR and this needs conflict resolution

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is corrected now

@imply-cheddar
Copy link
Contributor

The CodeQL stuff seems annoying, but if we look at what it's saying and try to make it better, I think it actually does leave the code in a better state than we found it in, so let's try to resolve those CodeQL things.

@somu-imply
Copy link
Contributor Author

somu-imply commented Jan 24, 2023

Updated to remove duplicated code, unused exception and moved to Numbers.tryParse to avoid the possible issues CodeQL was pointing to.

Edit: The numbers.parse has some NPEs currently binding the codeQL part inside try catch blocks

@cheddar cheddar merged commit 17c0167 into apache:master Jan 25, 2023
abhagraw pushed a commit to abhagraw/druid that referenced this pull request Feb 8, 2023
@clintropolis clintropolis added this to the 26.0 milestone Apr 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants