-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move trim functions (btrim, ltrim, rtrim) to datafusion_functions, make expr_fn API consistent #9730
Conversation
…ailable feature in DataFusion and building with nightly may not be a good recommendation when getting started.
string_functions
string_functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much @Omega359 -- this looks great to me
/// btrim('xyxtrimyyx', 'xyz') = 'trim' | ||
pub fn btrim<T: OffsetSizeTrait>(args: &[ArrayRef]) -> Result<ArrayRef> { | ||
fn btrim<T: OffsetSizeTrait>(args: &[ArrayRef]) -> Result<ArrayRef> { | ||
general_trim::<T>(args, TrimType::Both) | ||
} | ||
|
||
#[derive(Debug)] | ||
pub(super) struct TrimFunc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't a huge deal, but I found it a little confusing that the function is called TrimFunc
but has a name of "btrim"
and an alias of "trim"
I would expect either BTrimFunc
or else the name of "trim" and an alias of "btrim"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Urgh, this was an oversight on my part when moving the Trim function here. I'll fix.
/// `$largeUtf8Type`, | ||
/// | ||
/// If the input type is `Utf8` or `Binary` the return type is `$utf8Type`, | ||
macro_rules! get_optimal_return_type { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for cleaning this up
@@ -752,70 +725,6 @@ mod tests { | |||
Int32Array | |||
); | |||
test_function!(BitLength, &[lit("")], Ok(Some(0)), i32, Int32, Int32Array); | |||
test_function!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I double checked and the functions are already tested in expr.slt (e.g. https://github.com/apache/arrow-datafusion/blob/47f4b5a67ac3b327764cbd4c0f42da7ac44854e5/datafusion/sqllogictest/test_files/expr.slt#L558-L574)
Thus removing these tests looks good to me 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually migrating that test_function for another PR (#9702) so I think what I'd like to do is to have a followup PR after that lands to readd those tests into the code. I'm not a fan of relying strictly on the sql tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. I actually think the sql coverage is the most important (as it ensures everything works end to end). That being said having additional unit test coverage is not a bad thing either
@@ -919,26 +922,25 @@ lpad(str, n[, padding_str]) | |||
|
|||
### `ltrim` | |||
|
|||
Removes leading spaces from a string. | |||
Trims the specified trim string from the beginning of a string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for correcting and rationalizing the documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Omega359
Which issue does this PR close?
Closes #9703
Rationale for this change
Function migration.
What changes are included in this PR?
Code and tests.
Are these changes tested?
Yes.
Are there any user-facing changes?
Yes. The dataframe api now requires a vec![] parameter for all trim functions vs before where it seemed to be mixed between vec and not.
There was a mismatch between the sql interface, the dataframe api and the documentation around these functions. As you can see in the expr.slt file all the trim functions could be provided with a second argument which would be the string to use when trimming. However, the documentation and the dataframe api only seems to have that for btrim, not the other two functions.
This PR aligns the documentation and the dataframe with the sql so that all accept a second (optional) argument.