Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python, rust): implement unique counts for boolean datatype #16588

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

henrikig
Copy link
Contributor

Closes #16356

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels May 29, 2024
Copy link

codecov bot commented May 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.34%. Comparing base (3fe4cfe) to head (38ff3bf).

Current head 38ff3bf differs from pull request most recent head 99e3c02

Please upload reports for the commit 99e3c02 to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16588      +/-   ##
==========================================
- Coverage   81.37%   81.34%   -0.04%     
==========================================
  Files        1425     1424       -1     
  Lines      187669   187226     -443     
  Branches     2702     2698       -4     
==========================================
- Hits       152720   152301     -419     
+ Misses      34453    34429      -24     
  Partials      496      496              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@henrikig henrikig force-pushed the implement-unique-counts-for-boolean branch from f92c262 to 7966d1d Compare May 30, 2024 08:32
@henrikig henrikig requested a review from ritchie46 May 30, 2024 18:57
@henrikig henrikig requested a review from mcrumiller June 4, 2024 11:17
@henrikig henrikig force-pushed the implement-unique-counts-for-boolean branch from c8a3785 to 43c198d Compare June 4, 2024 11:20
return IdxCa::new(ca.name(), [] as [IdxSize; 0]);
}

let ca = ca.rechunk();
Copy link
Contributor

@mcrumiller mcrumiller Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ritchie46 can answer better here, but I think if we're spending the energy rechunking a series (which mem copies), the original array should receive the benefit as well, which you can do by making ca mut and calling *ca.rechunk().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be. However, I can't see this being done anywhere else in the codebase, which makes me question whether it should be done, so will wait for Ritchie's answer.

I guess you meant @ritchie46 btw.

Copy link

codspeed-hq bot commented Jun 4, 2024

CodSpeed Performance Report

Merging #16588 will not alter performance

Comparing henrikig:implement-unique-counts-for-boolean (99e3c02) with main (3fe4cfe)

Summary

✅ 37 untouched benchmarks

@henrikig henrikig requested a review from mcrumiller June 5, 2024 06:55
@henrikig henrikig force-pushed the implement-unique-counts-for-boolean branch from e599f83 to 0473baa Compare June 5, 2024 10:01
@ritchie46
Copy link
Member

ritchie46 commented Jun 6, 2024

@henrikig can you use this trait for determining the unique values.

#16765

@henrikig henrikig force-pushed the implement-unique-counts-for-boolean branch from cc9f4ef to 5940035 Compare June 6, 2024 11:09
@henrikig henrikig force-pushed the implement-unique-counts-for-boolean branch from 5940035 to fb08134 Compare June 6, 2024 16:28
@henrikig
Copy link
Contributor Author

henrikig commented Jun 6, 2024

@henrikig can you use this trait for determining the unique values.

#16765

@ritchie46 I could make some simplifications in the code when the unique count is 1 (see 38ff3bf for details). Were there any other particular use cases for this trait that you thought of? If so its not immediately apparent to me as I think we still need the actual counts and order.

@henrikig henrikig force-pushed the implement-unique-counts-for-boolean branch from 38ff3bf to 2f34e60 Compare June 10, 2024 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Series.unique_counts not supported for dtype bool
3 participants