-
Notifications
You must be signed in to change notification settings - Fork 845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unify substring for binary&utf8 #4442
Conversation
7c0c8ab
to
a89686a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jackwener -- the idea here is great and reducing code is 💯
My only concern is about utf8 boundary checking
maybe @tustvold has some thoughts
3f0a1cc
to
34615bb
Compare
Thanks for @alamb review, I have added code to check just when array is StringArray/LargeStringArray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jackwener -- looks great to me
@@ -1020,4 +968,17 @@ mod tests { | |||
let err = substring(&array, 0, Some(5)).unwrap_err().to_string(); | |||
assert!(err.contains("invalid utf-8 boundary")); | |||
} | |||
|
|||
#[test] | |||
fn non_utf8_bytes() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
Which issue does this PR close?
Closes #.
Rationale for this change
What changes are included in this PR?
After arrow-rs have
GenericByteArray
, we can unifysubstring
ofbinary
&utf8
.Are there any user-facing changes?