-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat:implement sql style 'substr_index' string function #8272
Conversation
---- | ||
NULL | ||
|
||
query ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome, can we also have the same tests with empty strings as input and search token?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add empty string tests and 0 count tests
/// SUBSTRING_INDEX('www.apache.org', '.', -2) = apache.org | ||
/// SUBSTRING_INDEX('www.apache.org', '.', -1) = org | ||
pub fn substr_index<T: OffsetSizeTrait>(args: &[ArrayRef]) -> Result<ArrayRef> { | ||
let string_array = as_generic_string_array::<T>(&args[0])?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to add a defense check args is exactly 3 elements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I add the args len check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Syleechan please take a look into minor comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍 , and tested below in mysql, maybe add them is better
SELECT SUBSTRING_INDEX('www.mysql.com', '.', 4); -> www.mysql.com SELECT SUBSTRING_INDEX('www.mysql.com', '.', 0); ->
just the same as SELECT substr_index('www.apache.org', 'ac', 2) |
/// SUBSTRING_INDEX('www.apache.org', '.', -1) = org | ||
pub fn substr_index<T: OffsetSizeTrait>(args: &[ArrayRef]) -> Result<ArrayRef> { | ||
if args.len() != 3 { | ||
return Err(DataFusionError::Internal(format!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use internal_err!
macros
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, change to internal_err.
Thanks @Syleechan , we are very close, please address the minor to use error macros, it gives a possibility to backtrace the error |
@alamb please help to merge if there are no other concerns, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @Syleechan
Thanks @Syleechan as well as @comphead and @Ted-Jiang for the review Since @Ted-Jiang is a committer as well, in the future please feel free to merge PRs once they have been reviewed and follow https://arrow.apache.org/datafusion/contributor-guide/index.html#pull-request-overview (as this one has) |
Which issue does this PR close?
Closes #.
Rationale for this change
https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_substring-index:~:text=SUBSTRING_INDEX(str%2Cdelim%2Ccount)
In calcite, the function expression name is substr_index
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?