Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(python) pl.min(['*']) and pl.max(['*']) cannot handle wildcards like pl.sum(["*"]) #5512

Closed
2 tasks done
sorhawell opened this issue Nov 15, 2022 · 1 comment
Closed
2 tasks done
Labels
bug Something isn't working python Related to Python Polars

Comments

@sorhawell
Copy link
Contributor

sorhawell commented Nov 15, 2022

Polars version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Issue description

pl.min([str1,str2,str3,...]) and pl.max([str1,str2,str3,...]) have their own implementation which cannot handle a wildcard as it resolves the expression immediately.

This issue is fixed in this PR #5511 be using the pl.sum([...]) impl pattern on pl.min and pl.max. However someone should just check if this vector allocation is performance-wise ok.

Reproducible example

import polars as pl

fruits_cars =  pl.DataFrame({
"A": [1, 2, 3, 4, 5],
"fruits": ["banana", "banana", "apple", "apple", "banana"],
"B": [5, 4, 3, 2, 1],
"cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
})


#try use wild card
fruits_cars.select([pl.col(pl.datatypes.Int64)]).select(pl.min(["*"]))

#hmm it just returned all columns untouched
shape: (5, 2)
┌─────┬─────┐
│ AB   │
│ ------ │
│ i64i64 │
╞═════╪═════╡
│ 15   │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 24   │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 33   │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 42   │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 51   │
└─────┴─────┘

#lets try to inspect the returned expression only
print(pl.min(["*"]))

#oh it returns the same as pl.col("*"), that is very unexpected
*

Expected behavior

def test_max_min_wildcard_columns(fruits_cars: pl.DataFrame) -> None:
    res = fruits_cars.select([pl.col(pl.datatypes.Int64)]).select(pl.min(["*"]))
    assert res.to_series(0).series_equal(pl.Series("min", [1, 2, 3, 2, 1]))
    res = fruits_cars.select([pl.col(pl.datatypes.Int64)]).select(pl.max(["*"]))
    assert res.to_series(0).series_equal(pl.Series("max", [5, 4, 3, 4, 5]))

Installed versions

---Version info---
Polars: 0.14.28
Index type: UInt32
Platform: macOS-10.16-x86_64-i386-64bit
Python: 3.8.0 (v3.8.0:fa919fdf25, Oct 14 2019, 10:23:27) 
[Clang 6.0 (clang-600.0.57)]
---Optional dependencies---
pyarrow: <not installed>
pandas: 1.3.3
numpy: 1.21.2
fsspec: <not installed>
connectorx: <not installed>
xlsx2csv: <not installed>
matplotlib: <not installed>
@sorhawell sorhawell added bug Something isn't working python Related to Python Polars labels Nov 15, 2022
@ritchie46
Copy link
Member

closed by #5511

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

2 participants