-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazily compute the null count of Array
to reduce cpu cost in high cardinality aggr
#6146
Labels
enhancement
Any new improvement worthy of a entry in the changelog
Comments
Rachelint
added
the
enhancement
Any new improvement worthy of a entry in the changelog
label
Jul 28, 2024
I have made a poc that impl it in the atomic way and introduce it to datafusion: And the q32 in clickbench seems 10~15% faster.
|
Rachelint
changed the title
Lazily compute the null count of
Lazily compute the null count of Jul 28, 2024
Array
Array
to reduce cpu cost in slice case
Rachelint
changed the title
Lazily compute the null count of
Lazily compute the null count of Jul 28, 2024
Array
to reduce cpu cost in slice caseArray
to reduce cpu cost of slice function
Rachelint
changed the title
Lazily compute the null count of
Lazily compute the null count of Jul 29, 2024
Array
to reduce cpu cost of slice functionArray
to reduce cpu cost in high cardinality aggr
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
When
group by
columns are of very high cardinality, like clickbench q32, the computation of null count inslice
function has the non-trivial cpu cost actually(see flamegraph in additional context).Actually, the null count in
arrow-cpp
will be computed lazily in the scenario that we need to scan the null buffer to get it (such as slice case).https://github.com/apache/arrow/blob/187197c369058f7d1377c1b161c469a9e4542caf/cpp/src/arrow/array/data.cc#L206-L218
Should
arrow-rust
makenull count
a lazy computation to reduce the cost ofslice
, too?Describe the solution you'd like
Make the
null_count
inNullBuffer
anAtomicI64
likearrow-cpp
.But I worry other related function calling
null_count
will suffer the performance regression due to introducing the atomic...Describe alternatives you've considered
Maybe we can optimize the
slice()
function inNullBuffer
only, we check the inputs and found low cost to calculate the newnull_count
.For example,
We calculate the
null_count in [9, 10)
and the needednull_count in [0, 9)
=original null count
-null_count in [9, 10)
.Additional context
The text was updated successfully, but these errors were encountered: