Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Series.hist #1859

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

feat: Series.hist #1859

wants to merge 4 commits into from

Conversation

camriddell
Copy link
Contributor

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

Narwhals Expressions do not yet allow one to return a DataFrame (or a struct), so .hist is only implemented at the Series level to be compliant with the rest of the API.

The PyArrow implementation can likely be streamlined a bit more, so a review on that section would be appreciated.

@MarcoGorelli
Copy link
Member

wow, amazing!

.hist is only implemented at the Series level to be compliant with the rest of the API.

agree, good design decision here, I think it would be quite awkward to add Expr.hist because:

  • struct dtype is not supported by default in pandas (and not at all in pandas pre 2.0. Maybe 1.5. But not before that)
  • it's a length-changing expression which it doesn't make sense to aggregate (e.g. nw.col('a').unique().len() makes sense but nw.col('a').hist().struct.field('value').sum() doesn't seem useful..), so supporting this for pyspark / duckdb / ibis could be a real issue. Ibis does seem to have bucket but there's no examples and I have no idea what it does

Hi @mscolnick - just wanted to check that Series.hist would still be useful to you?

Copy link
Member

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a ton @camriddell ! I left a couple of comments and will go through the arrow implementation in more details later today πŸš€

Comment on lines +1042 to +1043
from pandas import Categorical
from pandas import cut
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use self.__native_namespace__() here as well in place of pandas. I can see that cudf has a cut function, as well as modin

Comment on lines +1017 to +1018
import pyarrow as pa
import pyarrow.compute as pc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are now imported on the top of the file anyway

Suggested change
import pyarrow as pa
import pyarrow.compute as pc

@FBruzzesi FBruzzesi added the enhancement New feature or request label Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants