You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My use case: I am compiling genomic metadata across about a dozen studies. A lot of studies have internally conflicting metadata due to types, eg saying sample "SAMN0001" is both resistant and not-resistant to some antibiotic -- if an antibiotic's column is a list of length or two or more, I want to throw out that row's metadata by overwriting the list at that column with pl.Null. Right now, it seems that can't be done (but there might be a workaround by chaining a few more expressions?)
Expected behavior
A list's length should only be the length of what's actually in it, including null values. When a list is overwritten to be a single pl.Null value, the length of that "list" should be 0, not what it was prior. In other words:
If pl.when(pl.col("b").list.len() <= 1).then(pl.col("b")).otherwise(None).alias("nulled_b") is actually setting the value to something like [pl.Null, pl.Null] instead of pl.Null, that raises some additional issues:
[pl.Null, pl.Null] is being printed as null instead of [null, null] which isn't clear, nor is consistent with how [1, 2, null] gets printed
It seems to imply there isn't a way to set a list into something with length 0, unless it was defined that way during dataframe creation (like the second row of "a" in the example)
Checks
Reproducible example
Log output
Issue description
Completely overwriting a list as pl.Null results in the list's length being considered what is was prior to the overwrite.
Related but not quite the same: #18522
My use case: I am compiling genomic metadata across about a dozen studies. A lot of studies have internally conflicting metadata due to types, eg saying sample "SAMN0001" is both resistant and not-resistant to some antibiotic -- if an antibiotic's column is a list of length or two or more, I want to throw out that row's metadata by overwriting the list at that column with pl.Null. Right now, it seems that can't be done (but there might be a workaround by chaining a few more expressions?)
Expected behavior
A list's length should only be the length of what's actually in it, including null values. When a list is overwritten to be a single pl.Null value, the length of that "list" should be 0, not what it was prior. In other words:
If
pl.when(pl.col("b").list.len() <= 1).then(pl.col("b")).otherwise(None).alias("nulled_b")
is actually setting the value to something like[pl.Null, pl.Null]
instead ofpl.Null
, that raises some additional issues:null
instead of[null, null]
which isn't clear, nor is consistent with how[1, 2, null]
gets printedInstalled versions
The text was updated successfully, but these errors were encountered: