-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support sorting on complex columns in MSQ #16322
Conversation
processing/src/main/java/org/apache/druid/frame/key/RowKeyComparisonRunLengths.java
Fixed
Show fixed
Hide fixed
processing/src/main/java/org/apache/druid/frame/read/FrameReaderUtils.java
Fixed
Show fixed
Hide fixed
processing/src/main/java/org/apache/druid/frame/read/FrameReaderUtils.java
Fixed
Show fixed
Hide fixed
processing/src/main/java/org/apache/druid/frame/read/FrameReaderUtils.java
Fixed
Show fixed
Hide fixed
processing/src/test/java/org/apache/druid/frame/write/FrameWriterTest.java
Fixed
Show fixed
Hide fixed
...e/multi-stage-query/src/main/java/org/apache/druid/msq/querykit/groupby/GroupByQueryKit.java
Fixed
Show fixed
Hide fixed
processing/src/main/java/org/apache/druid/frame/key/FrameComparisonWidgetImpl.java
Fixed
Show fixed
Hide fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add uts for
- The runLength stuff
- ByteComparators
- Frame comparators
- ByteFrameComparators.
processing/src/main/java/org/apache/druid/frame/key/RowKeyComparisonRunLengths.java
Outdated
Show resolved
Hide resolved
processing/src/main/java/org/apache/druid/frame/key/RowKeyComparisonRunLengths.java
Outdated
Show resolved
Hide resolved
processing/src/main/java/org/apache/druid/frame/key/RowKeyComparisonRunLengths.java
Outdated
Show resolved
Hide resolved
processing/src/main/java/org/apache/druid/frame/key/RowKeyComparisonRunLengths.java
Show resolved
Hide resolved
processing/src/main/java/org/apache/druid/frame/key/ByteRowKeyComparator.java
Outdated
Show resolved
Hide resolved
processing/src/main/java/org/apache/druid/frame/key/ByteRowKeyComparator.java
Outdated
Show resolved
Hide resolved
Thanks for the review @cryptoe. I have addressed the comments and added tests for all the comparisons that I have added as a part of this patch.
|
While making the changes suggested in #16322 (comment), I realized this would be a backward incompatible change. This will modify the way the complex fields are written on the frame. Since frames are materialized in durable storage, frames generated by the newer byte-comparable writers won't get read properly by the older clusters, and the newer ones won't read the frames generated by the older clusters. |
...ns-core/multi-stage-query/src/test/java/org/apache/druid/msq/exec/MSQComplexGroupByTest.java
Dismissed
Show dismissed
Hide dismissed
...ns-core/multi-stage-query/src/test/java/org/apache/druid/msq/exec/MSQComplexGroupByTest.java
Dismissed
Show dismissed
Hide dismissed
...ns-core/multi-stage-query/src/test/java/org/apache/druid/msq/exec/MSQComplexGroupByTest.java
Dismissed
Show dismissed
Hide dismissed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than the line comments, please run a benchmark to verify that performance has not degraded for the non-complex case. You can use FrameChannelMergerBenchmark
. For extra credit, extend it to optionally include complex keys and benchmark that as well.
sql/src/test/java/org/apache/druid/sql/calcite/CalciteNestedDataQueryTest.java
Outdated
Show resolved
Hide resolved
sql/src/test/java/org/apache/druid/sql/calcite/CalciteNestedDataQueryTest.java
Outdated
Show resolved
Hide resolved
@@ -1086,6 +1086,8 @@ public void testGroupByRootSingleTypeStringMixed2Sparse() | |||
@Test | |||
public void testGroupByRootSingleTypeStringMixed2SparseJsonValueNonExistentPath() | |||
{ | |||
// Fails while planning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MSQ requires the scan signature, however the path doesn't exist. Therefore https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/virtual/NestedFieldVirtualColumn.java#L1279 returns null
when the column capabilities is asked, while the planner requires the capability to determine the column's type https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java#L1733-L1733.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the comment with the reason
sql/src/test/java/org/apache/druid/sql/calcite/CalciteNestedDataQueryTest.java
Outdated
Show resolved
Hide resolved
sql/src/test/java/org/apache/druid/sql/calcite/CalciteNestedDataQueryTest.java
Outdated
Show resolved
Hide resolved
assert runLengthEntry.getRunLength() == 1; | ||
// 'fieldsComparedTillNow' is the index of the current keyColumn in the keyColumns list. Sanity check that its | ||
// a known complex type | ||
ColumnType columnType = signature.getColumnType(keyColumns.get(fieldsComparedTillNow).columnName()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid this map lookup for getColumnType(String)
and also avoid the lookup by string in serdeMap
. That's expensive stuff to be doing for each comparison. Instead, how about doing a serdes
that is a pre-built ComplexMetricsSerde[]
(in the constructor), indexed by position in keyColumns
. Each entry in serdes
is either a serde (for complex types) or null
otherwise. The entire serdes
should be null if there are no complex types.
processing/src/test/java/org/apache/druid/frame/key/ByteRowKeyComparatorTest.java
Show resolved
Hide resolved
processing/src/test/java/org/apache/druid/frame/key/ByteRowKeyComparatorTest.java
Outdated
Show resolved
Hide resolved
assert runLengthEntry.getRunLength() == 1; | ||
// 'fieldsComparedTillNow' is the index of the current keyColumn in the keyColumns list. Sanity check that its | ||
// a known complex type | ||
ColumnType columnType = rowSignature.getColumnType(keyColumns.get(fieldsComparedTillNow).columnName()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment to FrameComparisonWidgetImpl
: we should avoid this map lookup for getColumnType(String)
and also avoid the lookup by string in serdeMap
. That's expensive stuff to be doing for each comparison. Instead, how about doing a serdes
that is a pre-built ComplexMetricsSerde[]
(in the constructor), indexed by position in keyColumns
. Each entry in serdes
is either a serde (for complex types) or null
otherwise. The entire serdes
should be null if there are no complex types.
I don't think rowSignature
is used for anything else, so when we make this change we can also get rid of the rowSignature
field.
import java.util.Collections; | ||
import java.util.List; | ||
|
||
public class RowKeyComparisonRunLengthsTest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a situation like this, I like to have one test that does a whole space of things. Here I suggest a test case that checks all possible length-3 keys where we vary each key element's type between string and complex, and direction between asc and desc. So that's 4 possibilities for each key element (string asc, string desc, complex asc, complex desc) and therefore 64 keys we're testing. It will run fast since it's only 64 cases, and it gets good coverage of different possibilities for runs and orderings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was considering having a parameterized test, however, how do I add the expected results for each test case generically, without repeating the logic in RowKeyComparisonRunLengths
.
For example, if the input test case is "complex ASC, string DESC, string DESC"; how can I get the expected value of the result, without repeating what I have written in RowKeyComparisonRunLength
? One way would be iterating the results of all 64 combinations, which can be done, but I wanted to confirm if that's what you referred to in the comment.
The test case would read something like:
assertionHelper(
HelperUtils.createCheckerFor(new RunLength(false, 1, ASC), new RunLength(true, 2, DESC))
RowKeyComparisonRunLength.create("complex ASC, string DESC, string DESC")
);
.... (total 64 entries)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of having all 64 test cases enumerated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the test cases - the tests are generated programmatically in the loop (mapping each index to the corresponding permutation of the key column), and the expectations are enumerated corresponding to each test case
Regarding the I have fixed up those test cases and resolved the comments pertaining to those tests. |
Thank you! Good to know, I was worried something was wrong with the implementation on the MSQ side. |
Benchmarked the pre-existing code, there isn't a regression in the byte comparable types:
I'll update this comment once I benchmark the complex comparison code as well |
Thanks for a thorough review @gianm 😄
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The recent changes are looking good to me. I had a few cleanup comments.
processing/src/main/java/org/apache/druid/frame/field/ComplexFieldReader.java
Outdated
Show resolved
Hide resolved
processing/src/main/java/org/apache/druid/frame/key/ByteRowKeyComparator.java
Outdated
Show resolved
Hide resolved
processing/src/main/java/org/apache/druid/frame/key/ByteRowKeyComparator.java
Show resolved
Hide resolved
processing/src/main/java/org/apache/druid/frame/key/FrameComparisonWidgetImpl.java
Show resolved
Hide resolved
processing/src/main/java/org/apache/druid/frame/key/FrameComparisonWidgetImpl.java
Outdated
Show resolved
Hide resolved
processing/src/main/java/org/apache/druid/frame/write/FrameWriterUtils.java
Outdated
Show resolved
Hide resolved
@@ -7072,6 +7090,8 @@ public void testUnnestJsonQueryArraysJsonValueSum() | |||
@Test | |||
public void testJsonValueNestedEmptyArray() | |||
{ | |||
// Returns incorrect results with MSQ | |||
msqIncompatible(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, we can leave this for another time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM!!.
processing/src/main/java/org/apache/druid/frame/write/FrameWriterUtils.java
Fixed
Show fixed
Hide fixed
Thanks @LakshSingla for the updates. |
MSQ sorts the columns in a highly specialized manner by byte comparisons. As such the values are serialized differently. This works well for the primitive types and primitive arrays, however complex types cannot be serialized specially. This PR adds the support for sorting the complex columns by deserializing the value from the field and comparing it via the type strategy. This is a lot slower than the byte comparisons, however, it's the only way to support sorting on complex columns that can have arbitrary serialization not optimized for MSQ. The primitives and the arrays are still compared via the byte comparison, therefore this doesn't affect the performance of the queries supported before the patch. If there's a sorting key with mixed complex and primitive/primitive array types, for example: longCol1 ASC, longCol2 ASC, complexCol1 DESC, complexCol2 DESC, stringCol1 DESC, longCol3 DESC, longCol4 ASC, the comparison will happen like: longCol1, longCol2 (ASC) - Compared together via byte-comparison, since both are byte comparable and need to be sorted in ascending order complexCol1 (DESC) - Compared via deserialization, cannot be clubbed with any other field complexCol2 (DESC) - Compared via deserialization, cannot be clubbed with any other field, even though the prior field was a complex column with the same order stringCol1, longCol3 (DESC) - Compared together via byte-comparison, since both are byte comparable and need to be sorted in descending order longCol4 (ASC) - Compared via byte-comparison, couldn't be coalesced with the previous fields as the direction was different This way, we only deserialize the field wherever required
* Support sorting on complex columns in MSQ (#16322) MSQ sorts the columns in a highly specialized manner by byte comparisons. As such the values are serialized differently. This works well for the primitive types and primitive arrays, however complex types cannot be serialized specially. This PR adds the support for sorting the complex columns by deserializing the value from the field and comparing it via the type strategy. This is a lot slower than the byte comparisons, however, it's the only way to support sorting on complex columns that can have arbitrary serialization not optimized for MSQ. The primitives and the arrays are still compared via the byte comparison, therefore this doesn't affect the performance of the queries supported before the patch. If there's a sorting key with mixed complex and primitive/primitive array types, for example: longCol1 ASC, longCol2 ASC, complexCol1 DESC, complexCol2 DESC, stringCol1 DESC, longCol3 DESC, longCol4 ASC, the comparison will happen like: longCol1, longCol2 (ASC) - Compared together via byte-comparison, since both are byte comparable and need to be sorted in ascending order complexCol1 (DESC) - Compared via deserialization, cannot be clubbed with any other field complexCol2 (DESC) - Compared via deserialization, cannot be clubbed with any other field, even though the prior field was a complex column with the same order stringCol1, longCol3 (DESC) - Compared together via byte-comparison, since both are byte comparable and need to be sorted in descending order longCol4 (ASC) - Compared via byte-comparison, couldn't be coalesced with the previous fields as the direction was different This way, we only deserialize the field wherever required * Fix conflicts --------- Co-authored-by: Laksh Singla <[email protected]>
Before the patch
After the patch
|
Description
MSQ sorts the columns in a highly specialized manner by byte comparisons. As such the values are serialized differently. This works well for the primitive types and primitive arrays, however complex types cannot be serialized specially.
This PR adds the support for sorting the complex columns by deserializing the value from the field and comparing it via the type strategy. This is a lot slower than the byte comparisons, however, it's the only way to support sorting on complex columns that can have arbitrary serialization not optimized for MSQ.
The primitives and the arrays are still compared via the byte comparison, therefore this doesn't affect the performance of the queries supported before the patch. If there's a sorting key with mixed complex and primitive/primitive array types, for example:
longCol1 ASC, longCol2 ASC, complexCol1 DESC, complexCol2 DESC, stringCol1 DESC, longCol3 DESC, longCol4 ASC
, the comparison will happen like:longCol1, longCol2 (ASC)
- Compared together via byte-comparison, since both are byte comparable and need to be sorted in ascending ordercomplexCol1 (DESC)
- Compared via deserialization, cannot be clubbed with any other fieldcomplexCol2 (DESC)
- Compared via deserialization, cannot be clubbed with any other field, even though the prior field was a complex column with the same orderstringCol1, longCol3 (DESC)
- Compared together via byte-comparison, since both are byte comparable and need to be sorted in descending orderlongCol4 (ASC)
- Compared via byte-comparison, couldn't be coalesced with the previous fields as the direction was differentThis way, we only deserialize the field wherever required
Backward Compatibility
No backward incompatible change has been made.
Future work
Complex types can expose and implement special handling for types such as SerializablePairs, which can be serialized in a byte-comparable way and won't need to be deserialized for comparison. It will fit in easily with the current changes.
Release note
MSQ now supports sorting on all the complex columns, and grouping on the complex columns that support it.
Key changed/added classes in this PR
MyFoo
OurBar
TheirBaz
This PR has: