-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support casting Utf8
to Boolean
#1738
Support casting Utf8
to Boolean
#1738
Conversation
Should I submit a related issue? |
@MazterQyou Yea, we prefer to have related issue submitted for tracking purpose. Thank you. |
Codecov Report
@@ Coverage Diff @@
## master #1738 +/- ##
==========================================
+ Coverage 83.27% 83.30% +0.02%
==========================================
Files 195 196 +1
Lines 55896 55998 +102
==========================================
+ Hits 46549 46649 +100
- Misses 9347 9349 +2
Continue to review full report at Codecov.
|
Submitted and mentioned the related issue. |
@@ -280,6 +280,8 @@ pub fn can_cast_types(from_type: &DataType, to_type: &DataType) -> bool { | |||
/// | |||
/// Behavior: | |||
/// * Boolean to Utf8: `true` => '1', `false` => `0` | |||
/// * Utf8 to boolean: `true`, `yes`, `on`, `1` => `true`, `false`, `no`, `off`, `0` => `false`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just looked at C++ Arrow: https://github.com/apache/arrow/blob/b8431fba68e2540b3e57def0bd0ad718652c4b98/cpp/src/arrow/util/value_parsing.h#L92. Seems it only converts "0", "1", "true", "false". Should we follow it and remove "on", "off", "yes", "no"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't come across "on", "off", "yes", "no" in practice. They're language specific, unlike T and F which are computing primitives.
So I'd stick to "1", "true" and their inverses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess removing those rare variants makes sense.
unlike T and F which are computing primitives
Should t
, f
shorter equivalents be left in then? If yes, just t
and f
, or all shorter variants?
What about whitespace trimming?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't come across "on", "off", "yes", "no" in practice
Actually, I just remembered that YES
, NO
variants are used in ANSI information_schema
views.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @MazterQyou -- this is looking good. A little more polishing and hopefully this can make it into the next release in a day or two
aeb564c
to
ac19755
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code looks good -- thank you @MazterQyou
I left some style comments, but I think this PR is looking ready to go now
ac19755
to
f2d2e41
Compare
f2d2e41
to
65b2d7c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good . thanks @MazterQyou
Are we all happy to keep the on
/off
/yes
/no
?
I prefer to stick with "1", "true", "0", "false" only, but not strong opinion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
It's the first time for me to learn that there are so many strings can be accepted for BOOLEAN TYPE.
I echo this, but also not a strong opinion |
Can drop this after rebase on commit 486118c "Support casting Utf8 to Boolean (apache#1738)", first released in 16.0.0
Can drop this after rebase on commit 486118c "Support casting Utf8 to Boolean (apache#1738)", first released in 16.0.0
Can drop this after rebase on commit 486118c "Support casting Utf8 to Boolean (apache#1738)", first released in 16.0.0
Which issue does this PR close?
Closes #1740.
Rationale for this change
Casting Utf8 to Boolean is useful for comparison between the two types.
Different Utf8 inputs might be implied as true or false Boolean type; the change implements conversion from strings supported by PostgreSQL.
What changes are included in this PR?
This PR implements Utf8 to Boolean cast, respecting the
cast_options.safe
value, and adds related tests.Are there any user-facing changes?