Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Return null instead of 0. for rolling_std when window contains a single element and ddof=1 and there are nulls elsewhere in the Series #20077

Merged
merged 1 commit into from
Dec 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 11 additions & 20 deletions crates/polars-arrow/src/legacy/kernels/rolling/nulls/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -168,38 +168,29 @@ mod test {

let out = rolling_var(arr, 3, 1, false, None, None);
let out = out.as_any().downcast_ref::<PrimitiveArray<f64>>().unwrap();
let out = out
.into_iter()
.map(|v| v.copied().unwrap())
.collect::<Vec<_>>();
let out = out.into_iter().map(|v| v.copied()).collect::<Vec<_>>();

assert_eq!(out, &[0.0, 0.0, 2.0, 12.5]);
assert_eq!(out, &[None, None, Some(2.0), Some(12.5)]);
Comment on lines -176 to +173
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here the original series is [1, null, -1, 4], and the operation is rolling variance with window of 3 and min_periods 1 and ddof=1

So the windows are:

  • [1]
  • [1, null]
  • [1, null, -1]
  • [null, -1, 4]

and so it's correct that the first two need to output None, as they only have a single valid value and ddof=1:

In [50]: print(pl.Series([1]).std())
None


let testpars = Some(RollingFnParams::Var(RollingVarParams { ddof: 0 }));
let out = rolling_var(arr, 3, 1, false, None, testpars.clone());
let out = out.as_any().downcast_ref::<PrimitiveArray<f64>>().unwrap();
let out = out
.into_iter()
.map(|v| v.copied().unwrap())
.collect::<Vec<_>>();
let out = out.into_iter().map(|v| v.copied()).collect::<Vec<_>>();

assert_eq!(out, &[0.0, 0.0, 1.0, 6.25]);
assert_eq!(out, &[Some(0.0), Some(0.0), Some(1.0), Some(6.25)]);

let out = rolling_var(arr, 4, 1, false, None, None);
let out = out.as_any().downcast_ref::<PrimitiveArray<f64>>().unwrap();
let out = out
.into_iter()
.map(|v| v.copied().unwrap())
.collect::<Vec<_>>();
assert_eq!(out, &[0.0, 0.0, 2.0, 6.333333333333334]);
let out = out.into_iter().map(|v| v.copied()).collect::<Vec<_>>();
assert_eq!(out, &[None, None, Some(2.0), Some(6.333333333333334)]);

let out = rolling_var(arr, 4, 1, false, None, testpars.clone());
let out = out.as_any().downcast_ref::<PrimitiveArray<f64>>().unwrap();
let out = out
.into_iter()
.map(|v| v.copied().unwrap())
.collect::<Vec<_>>();
assert_eq!(out, &[0.0, 0.0, 1.0, 4.222222222222222]);
let out = out.into_iter().map(|v| v.copied()).collect::<Vec<_>>();
assert_eq!(
out,
&[Some(0.), Some(0.0), Some(1.0), Some(4.222222222222222)]
);
}

#[test]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -178,10 +178,10 @@ impl<

let denom = count - ddof;

if count == T::zero() {
if denom <= T::zero() {
None
} else if count == T::one() {
NumCast::from(0)
Some(T::zero())
Comment on lines -181 to +184
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like this it matches what's done for the no_nulls/variance.rs kernel

let denom = count - NumCast::from(self.ddof).unwrap();
if denom <= T::zero() {
None
} else if end - start == 1 {
Some(T::zero())

} else if denom <= T::zero() {
Some(T::infinity())
} else {
Expand Down
8 changes: 8 additions & 0 deletions py-polars/tests/unit/operations/rolling/test_rolling.py
Original file line number Diff line number Diff line change
Expand Up @@ -807,6 +807,14 @@ def test_rolling() -> None:
)


def test_rolling_std_nulls_min_periods_1_20076() -> None:
result = pl.Series([1, 2, None, 4]).rolling_std(3, min_periods=1)
expected = pl.Series(
[None, 0.7071067811865476, 0.7071067811865476, 1.4142135623730951]
)
assert_series_equal(result, expected)


def test_rolling_by_date() -> None:
df = pl.DataFrame(
{
Expand Down