-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.list.to_struct()
has non-deterministic behavior
#16450
Comments
It's happening due to Because df.group_by("a", "b").agg(pl.col("value").bottom_k(3))
# shape: (4, 3)
# ┌─────┬─────┬───────────┐
# │ a ┆ b ┆ value │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ str ┆ list[i64] │
# ╞═════╪═════╪═══════════╡
# │ 2 ┆ A ┆ [99] │
# │ 2 ┆ B ┆ [3] │
# │ 1 ┆ A ┆ [1, 2] │
# │ 1 ┆ B ┆ [4, 98] │
# └─────┴─────┴───────────┘
df.group_by("a", "b").agg(pl.col("value").bottom_k(3))
# shape: (4, 3)
# ┌─────┬─────┬───────────┐
# │ a ┆ b ┆ value │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ str ┆ list[i64] │
# ╞═════╪═════╪═══════════╡
# │ 1 ┆ A ┆ [1, 2] │
# │ 2 ┆ B ┆ [3] │
# │ 1 ┆ B ┆ [4, 98] │
# │ 2 ┆ A ┆ [99] │
# └─────┴─────┴───────────┘ You would need .list.to_struct("max_width", upper_bound=3)) |
Makes sense now! I guess this means Maybe (?) not a bug, but we might consider adding a warning to the documentation since it's a surprising consequence. |
Yeah - it is a bit of a footgun. There was a Notes section added with an explanation - but perhaps a Warning should also be added pointing to that section. |
Agreed. Would it make sense for It would behave similar to |
Yeah, something similar came up in #15742 recently. When pl.struct(
field_0 = pl.col("value").list.get(0),
field_1 = pl.col("value").list.get(1),
field_2 = pl.col("value").list.get(2)
) But not having to type all that out seems like it would be useful. |
Checks
Reproducible example
Log output
No response
Issue description
The bug occurs either with
upper_bound=3
specified or not.That is, replacing
with
will also reproduce the bug.
Expected behavior
The bug is that both of the following outputs are possible.
In my opinion, the
shape: (4, 4) ...
result is correct; but it is difficult to say what's expected without knowing why the non-determinism occurs in the first place.Installed versions
The text was updated successfully, but these errors were encountered: