-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Fix boolean distinct #16765
fix: Fix boolean distinct #16765
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #16765 +/- ##
=======================================
Coverage 81.32% 81.32%
=======================================
Files 1424 1424
Lines 187215 187223 +8
Branches 2726 2726
=======================================
+ Hits 152250 152267 +17
+ Misses 34468 34459 -9
Partials 497 497 ☔ View full report in Codecov by Sentry. |
This comment was marked as resolved.
This comment was marked as resolved.
CodSpeed Performance ReportMerging #16765 will improve performances by 20.14%Comparing Summary
Benchmarks breakdown
|
} else { | ||
let values = self.values() & self.validity().unwrap(); | ||
let unset_bits = self.values().unset_bits(); | ||
3 - usize::from(unset_bits == 0 || unset_bits == values.len()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ritchie46 Something seems off in this else branch, and I'm not getting the numbers I'd expect when trying to use this as you advised in #16588.
Take for instance the case when you have an array with null+true or null+false, you will never get a number less than 2 with this logic, while it should actually be 1 if I understand the function signature correctly.
Also, we're using the unset_bits()
from the values directly when we know that the array has nulls in it, which means the bits could be anything if I understand it correctly.
Some examples of what I get, and what I would expect:
[True, None, True, None, None] # got 3, expected 1
[False, None, False, None, None] # got 2, expected 1
[True, False, None, False, None, None] # got 3, expected 2
Added a debug_assert_eq
in my code as well as below, which fails:
let num_unique = usize::from(n_true > 0) + usize::from(n_false > 0);
let distinct_count = arr.distinct_count();
debug_assert_eq!(num_unique, distinct_count);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a new PR, see #16780
No description provided.