Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rust, python): pl.min & pl.max accept wildcard similar to pl.sum #5511

Merged
merged 3 commits into from
Nov 16, 2022

Conversation

sorhawell
Copy link
Contributor

pl.min and pl.max cannot handle wildcards as pl.sum.

pl.min([""]) becomes the same as pl.col(['']) because code impl handles [selected columns] as tokens before wildcard expansion.

this PR suggest to use the pl.sum impl pattern which relies on a fold that does wild card expansion with all bells and whistles.

Contra:
This PR do perform extra shallow copies which might hit performance if +1000 columns.
Maybe this can be fixed.

)
.alias("max")
let func = |s1, s2| {
let df = DataFrame::new_no_checks(vec![s1, s2]);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check if this pattern is performance-vise acceptable

Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I've left a few minor remarks.

// use u32 as that is not cast to float as eagerly
_ => lit(Series::new_empty("", &DataType::Int32)),
};
fold_exprs(init, func, exprs).alias("max")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use reduce_expr here. That one always takes the first column and saves a redundant operation.

And it is also simpler because we don't need to come up with a init value.

// use u32 as that is not cast to float as eagerly
_ => lit(Series::new_empty("", &DataType::Int32)),
};
fold_exprs(init, func, exprs).alias("min")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above reduce_expr.

res = fruits_cars.select([pl.col(pl.datatypes.Int64)]).select(pl.min(["*"]))
assert res.to_series(0).series_equal(pl.Series("min", [1, 2, 3, 2, 1]))

res = fruits_cars.select([pl.col(pl.datatypes.Int64)]).select(pl.max(["*"]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we test this with pl.all() as well? That is the ideomatic way to express that you want all columns.

@sorhawell sorhawell marked this pull request as ready for review November 15, 2022 14:52
@ritchie46 ritchie46 changed the title et pl.min & pl.max do wildcard as pl.sum feat(rust, python): pl.min & pl.max accept wildcard similar to pl.sum Nov 16, 2022
@ritchie46 ritchie46 merged commit b8716bd into pola-rs:master Nov 16, 2022
@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants