Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

min()/max() behaves different with nan #328

Closed
elbaro opened this issue Feb 4, 2021 · 4 comments · Fixed by #330
Closed

min()/max() behaves different with nan #328

elbaro opened this issue Feb 4, 2021 · 4 comments · Fixed by #330

Comments

@elbaro
Copy link
Contributor

elbaro commented Feb 4, 2021

>>> from math import nan
>>> pypolars.Series([1,2,3,4, nan])
Series: '' [f64]
[
	1
	2
	3
	4
	NaN
]
>>> pypolars.Series([1,2,3,4, nan]).max()
nan
>>> pypolars.Series([1,2,3,4, nan]).min()
1.0
@alamb
Copy link

alamb commented Mar 19, 2021

FWIW pandas seems to ignore Nans while calculating min/max:

Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pa;
>>> from math import nan
>>> s = pa.Series([1,2,3,4,nan])
>>> s.max()
4.0
>>> s.min()
1.0
>>> 

@elbaro
Copy link
Contributor Author

elbaro commented Mar 20, 2021

pandas use NaN to represent missing values. polars has None for this purpose, so I think both max(NaN, 2, 3)=3 and max(Nan, 2, 3)=NaN make sense, as long as we are consistent with all aggregation functions.

@ritchie46
Copy link
Member

ritchie46 commented Mar 20, 2021

Consistency indeed, and the possibility to ignore / include NaNs. I can imagine this is completely use case dependent. I like for instance in numerical tools like numpy, pytorch etc. to have the option mean(array), nan_mean(array)

@alamb
Copy link

alamb commented Mar 20, 2021

I can imagine this is completely use case dependent

That is likely, though I can't think of any usecase for an aggregate function like min or max to be null / nan if any of its inputs is null/nan

nan_mean I can understand as you may want to adjust the denominator for total element count (not just non nan)

Sorting is a different story and you may be sorting on columns that have Nans in them but the other columns are non-null preserved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants