Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fix boolean distinct #16765

Merged
merged 1 commit into from
Jun 6, 2024
Merged

fix: Fix boolean distinct #16765

merged 1 commit into from
Jun 6, 2024

Conversation

ritchie46
Copy link
Member

No description provided.

@github-actions github-actions bot added fix Bug fix python Related to Python Polars rust Related to Rust Polars labels Jun 6, 2024
Copy link

codecov bot commented Jun 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.32%. Comparing base (2398b47) to head (dc99ab9).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #16765   +/-   ##
=======================================
  Coverage   81.32%   81.32%           
=======================================
  Files        1424     1424           
  Lines      187215   187223    +8     
  Branches     2726     2726           
=======================================
+ Hits       152250   152267   +17     
+ Misses      34468    34459    -9     
  Partials      497      497           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@coastalwhite

This comment was marked as resolved.

Copy link

codspeed-hq bot commented Jun 6, 2024

CodSpeed Performance Report

Merging #16765 will improve performances by 20.14%

Comparing null_count (dc99ab9) with main (2398b47)

Summary

⚡ 1 improvements
✅ 36 untouched benchmarks

Benchmarks breakdown

Benchmark main null_count Change
test_groupby_h2oai_q1 2.7 ms 2.3 ms +20.14%

@ritchie46 ritchie46 merged commit 5a0c803 into main Jun 6, 2024
22 checks passed
@ritchie46 ritchie46 deleted the null_count branch June 6, 2024 10:46
} else {
let values = self.values() & self.validity().unwrap();
let unset_bits = self.values().unset_bits();
3 - usize::from(unset_bits == 0 || unset_bits == values.len())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ritchie46 Something seems off in this else branch, and I'm not getting the numbers I'd expect when trying to use this as you advised in #16588.

Take for instance the case when you have an array with null+true or null+false, you will never get a number less than 2 with this logic, while it should actually be 1 if I understand the function signature correctly.

Also, we're using the unset_bits() from the values directly when we know that the array has nulls in it, which means the bits could be anything if I understand it correctly.

Some examples of what I get, and what I would expect:

[True, None, True, None, None] # got 3, expected 1
[False, None, False, None, None] # got 2, expected 1
[True, False, None, False, None, None] # got 3, expected 2

Added a debug_assert_eq in my code as well as below, which fails:

    let num_unique = usize::from(n_true > 0) + usize::from(n_false > 0);
    let distinct_count = arr.distinct_count();
    debug_assert_eq!(num_unique, distinct_count);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a new PR, see #16780

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation fix Bug fix python Related to Python Polars rust Related to Rust Polars
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants