feat: Series.hist #1859

camriddell · 2025-01-23T23:17:41Z

What type of PR is this? (check all applicable)

Related issues

Related issue [Enh]: Support for narwhals.Expr.hist #1561

Checklist

Code follows style guide (ruff)
Tests added
Documented the changes

If you have comments or can explain your changes, please do so below

Narwhals Expressions do not yet allow one to return a DataFrame (or a struct), so .hist is only implemented at the Series level to be compliant with the rest of the API.

The PyArrow implementation can likely be streamlined a bit more, so a review on that section would be appreciated.

MarcoGorelli · 2025-01-24T07:30:58Z

wow, amazing!

.hist is only implemented at the Series level to be compliant with the rest of the API.

agree, good design decision here, I think it would be quite awkward to add Expr.hist because:

struct dtype is not supported by default in pandas (and not at all in pandas pre 2.0. Maybe 1.5. But not before that)
it's a length-changing expression which it doesn't make sense to aggregate (e.g. nw.col('a').unique().len() makes sense but nw.col('a').hist().struct.field('value').sum() doesn't seem useful..), so supporting this for pyspark / duckdb / ibis could be a real issue. Ibis does seem to have bucket but there's no examples and I have no idea what it does

Hi @mscolnick - just wanted to check that Series.hist would still be useful to you?

FBruzzesi

Thanks a ton @camriddell ! I left a couple of comments and will go through the arrow implementation in more details later today 🚀

FBruzzesi · 2025-01-24T08:38:17Z

narwhals/_pandas_like/series.py

+        from pandas import Categorical
+        from pandas import cut


We should use self.__native_namespace__() here as well in place of pandas. I can see that cudf has a cut function, as well as modin

FBruzzesi · 2025-01-24T08:39:01Z

narwhals/_arrow/series.py

+        import pyarrow as pa
+        import pyarrow.compute as pc


These are now imported on the top of the file anyway

Suggested change

import pyarrow as pa

import pyarrow.compute as pc

camriddell added 4 commits January 23, 2025 15:05

add hist scaffolding & tests

f75c540

implement hist for series & tests

92ae425

refactor pyarrow hist & allow pandas bin_count=0

35d1bc6

add expected edgecases to hist tests

6901b1d

FBruzzesi reviewed Jan 24, 2025

View reviewed changes

FBruzzesi added the enhancement New feature or request label Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Series.hist #1859

feat: Series.hist #1859

camriddell commented Jan 23, 2025

MarcoGorelli commented Jan 24, 2025

FBruzzesi left a comment

FBruzzesi Jan 24, 2025

FBruzzesi Jan 24, 2025

feat: Series.hist #1859

Are you sure you want to change the base?

feat: Series.hist #1859

Conversation

camriddell commented Jan 23, 2025

What type of PR is this? (check all applicable)

Related issues

Checklist

If you have comments or can explain your changes, please do so below

MarcoGorelli commented Jan 24, 2025

FBruzzesi left a comment

Choose a reason for hiding this comment

FBruzzesi Jan 24, 2025

Choose a reason for hiding this comment

FBruzzesi Jan 24, 2025

Choose a reason for hiding this comment