-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: native types in DistinctCountAccumulator
for primitive types
#8721
feat: native types in DistinctCountAccumulator
for primitive types
#8721
Conversation
Benchmark results are
The target query was clickbench q9 -- there are not so many queries where DF plans groups accumulator for count distinct. (x) marked queries were constantly killed by OOM on my laptop, so they were replaced with dummy |
CC @alamb @Dandandan It would be great, to get review / opinion on this PR from you, as from original issue participants, when (if) you have some time. |
Thank you @korowa -- I will review this PR tomorrow |
T: ArrowPrimitiveType + Send, | ||
{ | ||
/// Vector for storing unique values sets for each group index | ||
unique_values: Vec<HashSet<Hashable<T::Native>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hoped we could store this in something like HashSet<(usize, T::Native)>
, i.e. we could ammortize the allocations for all the groups instead of creating a HashSet
for each group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my initial intention -- but collecting intermediate states is quite painful when using pure HashSet<(group, value)>
, and using additional structures for storing chains of values for each group seemed like a memory overhead in this case.
However, I agree that managing single hash table per accumulator makes sense -- maybe the proper way of speeding up accumulator will be switching to native types in regular (current) accumulator first, and then start digging into proper groups accumulator implementation which will provide noticeable boost 🤔.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe the proper way of speeding up accumulator will be switching to native types in regular (current) accumulator first
I'll try this option then, I guess, at first -- anyway it was going to be follow up for current version.
|
||
impl<T: ToByteSlice> std::hash::Hash for Hashable<T> { | ||
fn hash<H: std::hash::Hasher>(&self, state: &mut H) { | ||
self.0.to_byte_slice().hash(state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is only needed for float types? It would probably be faster to use the native hash implementation for other (integer) types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it only required for floats (and it only makes sense to use it for floats, cause default implementations of these traits for integers outperforms this one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to have an implementation for primitive and float types instead one using ToByteSlice
then?
I am sorry -- I ran out of time today to give this a thorough review but I plan to do it first thing tomorrow |
DistinctCountGroupsAccumulator
for primitive typesDistinctCountAccumulator
for primitive types
UPD: after the comment by @Dandandan, regarding implementation of groups accumulator, I've realized that it's unclear (as for me) now how to implement it decently, and in the same time naive implementation will still require non-zero efforts to support it. In the same time, performance improvement mostly comes from replacement of The results for clickbench for updated branch:
Also works here |
|
||
Float16 => float_distinct_count_accumulator!(Float16Type), | ||
Float32 => float_distinct_count_accumulator!(Float32Type), | ||
Float64 => float_distinct_count_accumulator!(Float64Type), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for now, but if we would like to do it for strings / bytes, we could do use a datastructure like this to get maximal performance:
/// Contains hashes and offsets for given hash (+ potential collisions), use `RawTable` for extra speed
uniques: HashMap<u64, SmallVec<u64; 1>>,
/// actual string/byte data, can be emitted cheaply / free
values: BufferBuilder<u8>,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is similar to the idea in #7064
Maybe we can eventually use the same data structure (specialized for storing string values not using a String
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @korowa -- I think this PR is a significant improvement over the initial implementation.
I think the only thing that is critical to fix prior to merging is the size calculation for the HashSet
As @Dandandan points out, there are clearly some additional improvements that could be added that we should consider before merging but I also don't think they are critical.
Thanks again -- this is a really great step forward
|
||
Float16 => float_distinct_count_accumulator!(Float16Type), | ||
Float32 => float_distinct_count_accumulator!(Float32Type), | ||
Float64 => float_distinct_count_accumulator!(Float64Type), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is similar to the idea in #7064
Maybe we can eventually use the same data structure (specialized for storing string values not using a String
)
let arr = Arc::new(PrimitiveArray::<T>::from_iter_values( | ||
self.values.iter().cloned(), | ||
)) as ArrayRef; | ||
let list = Arc::new(array_into_list_array(arr)) as ArrayRef; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏 for @jayzhan211 for switching the native implementation of ScalarValue::List
to use an ArrayRef
@@ -336,9 +336,18 @@ where | |||
} | |||
|
|||
fn size(&self) -> usize { | |||
let estimated_buckets = (self.values.len().checked_mul(8).unwrap_or(usize::MAX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can (as a follow on PR) put this logic into its own function (with comments) as estimating the size of hashbrown hashtables is likely to come up again
Thank you @korowa this is awesome 🚀 |
Which issue does this PR close?
Part of #5472.
Rationale for this change
The main idea is to avoid
ScalarValue
wrappers for all update/merge operations to improve performance ofDistinctCountAccumulator
What changes are included in this PR?
Hashable
newtype for arrow primitives moved to utils module -- required to use float types as hashmap/hashset keysNativeDistinctCountAccumulator
-- implementation of accumulator for primitive types with implementedEq + Hash
traints (integer-based types)FloatDistinctCountAccumulator
-- implementation of accumulator for float-based types throughHashable
wrapperAre these changes tested?
Existing test coverage + additional test for bigint (
i256
-- base forDecimal256Type
).Are there any user-facing changes?
No