-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement specialized group values for single Uft8/LargeUtf8/Binary/LargeBinary column #8827
Conversation
f78bea7
to
886666b
Compare
This comment was marked as outdated.
This comment was marked as outdated.
9224a38
to
28d9105
Compare
This comment was marked as outdated.
This comment was marked as outdated.
7c23abb
to
09b3c5f
Compare
I think this is getting pretty close. I need to add some more fuzz tests and maybe try to make this code generic and work for BinaryArray but otherwise it is ready |
@@ -92,403 +86,3 @@ impl<O: OffsetSizeTrait> Accumulator for StringDistinctCountAccumulator<O> { | |||
std::mem::size_of_val(self) + self.0.size() | |||
} | |||
} | |||
|
|||
/// Maximum size of a string that can be inlined in the hash table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code and tests were moved into bytes_map.rs
and made general
1009b85
to
6ee2e03
Compare
6ee2e03
to
da68c45
Compare
Update here is I have written a fuzz test (which actually caught several bugs 🐛 ). I plan to take a final pass through the code tomorrow morning and get it up for review (It is a pretty large PR so I expect it will take time to review, but it will be worth it I think) Update:
|
da68c45
to
489c4c1
Compare
ccb0e9b
to
5854997
Compare
7f0b188
to
5a2a3d3
Compare
46ba28c
to
637908c
Compare
d9fa058
to
637908c
Compare
637908c
to
dc55c1b
Compare
Update here is that the benchmark programs consistently shows this query slower, but I can't reproduce the difference. I am continuing to investigate
|
3de9042
to
b2fe2c7
Compare
…mn string grouping compiles. add test for null
b2fe2c7
to
53e7274
Compare
}); | ||
// SAFETY: the offsets were constructed correctly in `insert_if_new` -- | ||
// monotonically increasing, overflows were checked. | ||
let offsets = unsafe { OffsetBuffer::new_unchecked(ScalarBuffer::from(offsets)) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validating the offsets actually took significant time which took me a while to find and trackdown (without this unsafe call ClickBench Q28 slowed down by 10%)
FYI @thinkharderdev and @Dandandan as you have expressed interest in this type of optimization before, this PR is now ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Thank you for the review @Dandandan 🙏 |
🚀 |
Note this looks like a large PR, but a substantial amount of it is moving code and tests and documentation
Which issue does this PR close?
closes #7064
Rationale for this change
This PR makes queries like this faster (grouping on strings):
ClickBench seems slightly slower in 34.0.0 than in 33.0.0: #8789 (comment) so we need some more real wins
One thing that takes significant time in queries is grouping on a single string column, where the current GroupsAccumulator copies the strings several times:
Rows
,OwnedRow
,Rows
to form the final output ArrayWe can make such queries faster by avoiding the copies
What changes are included in this PR?
SSOStringSet
that @jayzhan211 and I implemented in OptimizeCOUNT( DISTINCT ...)
for strings (up to 9x faster) #8849 intoArrowBinaryMap
which also handles values and both string and binaryCOUNT DISTINCT
specialization to handle binary/large binaryArrowBinaryMap
to implementGroupValuesByes
, aGroupsValues
implementation for String / LargeString / Binary / LargeBinary dataAre these changes tested?
Yes:
MemTable::with_sort_exprs
#9190)Are there any user-facing changes?
Faster performance.
For Clickbench, several of the longest running queries got significantly faster. This is almost certainly the result of less string copying as they are grouping by
"Url"
, a string columnThere are two that are reported to be slower, but I think given they run in 10s of ms this is likely a measurement error
For ClickBench extended things got faster:
For TPCH there were also several queries that got faster, but the time spent is so small (10s of ms) that I think it likely measurement error
Full Performance Results
TODOs
ArrowStringMap
ArrowStringMap
upstream to arrow-rs (could use the DictionaryArray builder)