-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(rust, python): pl.min & pl.max accept wildcard similar to pl.sum #5511
Conversation
) | ||
.alias("max") | ||
let func = |s1, s2| { | ||
let df = DataFrame::new_no_checks(vec![s1, s2]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check if this pattern is performance-vise acceptable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I've left a few minor remarks.
// use u32 as that is not cast to float as eagerly | ||
_ => lit(Series::new_empty("", &DataType::Int32)), | ||
}; | ||
fold_exprs(init, func, exprs).alias("max") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use reduce_expr
here. That one always takes the first column and saves a redundant operation.
And it is also simpler because we don't need to come up with a init
value.
// use u32 as that is not cast to float as eagerly | ||
_ => lit(Series::new_empty("", &DataType::Int32)), | ||
}; | ||
fold_exprs(init, func, exprs).alias("min") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above reduce_expr
.
res = fruits_cars.select([pl.col(pl.datatypes.Int64)]).select(pl.min(["*"])) | ||
assert res.to_series(0).series_equal(pl.Series("min", [1, 2, 3, 2, 1])) | ||
|
||
res = fruits_cars.select([pl.col(pl.datatypes.Int64)]).select(pl.max(["*"])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we test this with pl.all()
as well? That is the ideomatic way to express that you want all columns.
pl.min and pl.max cannot handle wildcards as pl.sum.
pl.min([""]) becomes the same as pl.col(['']) because code impl handles [selected columns] as tokens before wildcard expansion.
this PR suggest to use the pl.sum impl pattern which relies on a fold that does wild card expansion with all bells and whistles.
Contra:
This PR do perform extra shallow copies which might hit performance if +1000 columns.
Maybe this can be fixed.