-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(rust): Refactor AnyValue
casting logic
#13140
Conversation
.into_date() | ||
.into_series()), | ||
}, | ||
(Datetime(tu, _), Date) => Ok(match tu { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a minor ergonomic improvement.
I'm not sure about disallowing those casts. In the end, they're just integers under the hood. I'll mark this as a bug fix, though I'm not sure this should be 'fixed'. |
@stinodego isn't that just an implementation detail though? I'm noticing that we do allow casting from categorical to date and such, despite that not making much sense. Should we impose rules for logical casting based on whether they make sense, or whether they're possible due to the underlying structure? |
(also, I'm willing to add those casts back in--the main point of this PR was to organize the code and to improve the cast testing). |
It would be really helpful if each PR does only one thing. So if you would limit this PR to the refactor, and then followup with a PR for the casting update (or create an issue to discuss first), that would be great! |
@stinodego Absolutely, thanks for the helpful comments! |
@stinodego I just simplified the code a bit and re-added back the disabled casts. I think the resulting code is a bit easier to follow and manage, as it removed some indirection and additional functions/macros. |
@mcrumiller would you mind rebasing this? I recently ran into this mess - it would be nice if the casting logic was a bit cleaner. I think you made some good improvements here. |
Hi @stinodego sure, it may take a little while though as I think there have been a few changes to this since I've updated it and I'll have to understand how they impact the logic. |
AnyValue
casting logic
Thanks, and apologies for getting back to this so late. |
Not necessary! There is a sea of PRs. |
@stinodego any idea how to run a test on these? Cast AnyValue::String(s) if dtype == &DataType::Binary => AnyValue::Binary(s.as_bytes()),
_ => {
polars_bail!(ComputeError: "cannot cast any-value '{:?}' to '{:?}'", self.dtype(), dtype)
}, And indeed, when I run a simple script to cast a literal string to int: import polars as pl
print(pl.select(pl.lit("1").cast(pl.Int32))) That code path is hit:
This all happens through the optimization steps, and so my guess is that it's trying to perform the literal cast, can't, and then later expands the literal to a series and then performs the cast, which is why it works. What other code paths reach and rely on the AnyValue cast? |
The constructors will rely on this in the future (see But it's hard to test this directly. |
I'm also noting (for the future) that there's no currently no quick exit for the identity map, e.g. |
@stinodego I did my best to align what I had with the existing cast structure. It's possible there may be some minor changes, but I did not touch the test suite and it looks like everything is passing. I left quite a few I'm not completely convinced that the new structure in this refactor is actually better than what was there before--it's at least organized, but the tradeoff for reducing the number of function calls is that our matches now require (usually) two matches per arm. If we only matched on the Anyway, let me know of any suggested changes/improvements! I can remove the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments on the tests. If you could update this, I will take another look and probably shuffle some things around and merge it.
@stinodego I believe I have addressed your issues. |
1894ce0
to
071d7eb
Compare
071d7eb
to
3fde519
Compare
This PR improves the organization of the
AnyValue
casting logic. The current logic uses a few helper functions that are somewhat sporadically called, and it makes it a little hard to detect where holes in our casting logic exist.This implementation has the general format of
match (from_dtype, to_dtype)
, sorted byfrom_dtype
, with logic to deal with special cases first, and the non-special cases at the end to catch all remaining. This both removes a few function calls and makes it clearer where future casts should be placed. Overall, the result is a bit cleaner with fewer LOC, fewer function calls, and fewer total matche branches.