-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Ergonomic casts from Categorical
to Enum
#20645
feat: Ergonomic casts from Categorical
to Enum
#20645
Conversation
I am not sure about this. The fields in an enum belong to the data type. In a categorical they are inferred at runtime. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #20645 +/- ##
=======================================
Coverage 79.01% 79.01%
=======================================
Files 1557 1557
Lines 220552 220560 +8
Branches 2514 2514
=======================================
+ Hits 174274 174284 +10
+ Misses 45705 45703 -2
Partials 573 573 ☔ View full report in Codecov by Sentry. |
Yup; it's an ergonomic improvement to something that may not actually be a good idea, heh. I'll park in Draft so you can have a think about it; not a big PR - feel free to close if you prefer not to merge ;) |
I don't think "lexically sorted enum"s exist. That will panic in many places. |
FYI there's a little relevant discussion in #14952. |
As in, the Enum gets created with the values having been sorted (as the same Categorical values coming from several sources could have a very different physical ordering, but -if we were to merge this- we'd want the Enum equivalent to resolve consistently). I could have worded that better ;) |
I just wanted to freeze an existing categorical encoding, e.g. the output of cut(). Definitely didn’t imagine it would change the ordering. |
They're not quite the same thing; casting an arbitrary Categorical vs freezing a specific Categorical. @ritchie46: What do you think about a dedicated A dedicated method could also have a parameter to indicate whether the categories should be sorted or taken "as-is" too. Something like this: pl.col("countries").cat.to_enum(order="lexical")
pl.col("countries").cat.to_enum(sort=True) # or just a bool? |
There is no thing as freezing the categorical to enum. In a categorical, order of categories doesn't matter (currently it does, but we will fix that, in global it doesn't). In an enum the categories are ordered. I want people to be deliberate about that. If they are casting to enum currently because they can circumvent some issues with categorical, that's not a good reason to implement this. We need to improve categorical. |
Indeed; hence the suggestion to move it out into an explicit/dedicated method that allows them to do that, rather than have it live in cast. Not to worry, will close this out and it can be revisited some other time 👌 |
ref: #19868 (comment).
Allow for stable ergonomic casts from
Categorical
toEnum
.If no categories are defined, we read the
Categorical
rev_map and normalise to a lexically sortedEnum
(which ensures that if the same categories appear in a different order they will return the sameEnum
).Example
Before:
(note that we have to sort the categories to ensure a stable Enum)
After:
(no boilerplate, and the resulting enums will be stable/sorted)