Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move trim functions (btrim, ltrim, rtrim) to datafusion_functions, make expr_fn API consistent #9730

Merged
merged 11 commits into from
Mar 22, 2024

Conversation

Omega359
Copy link
Contributor

Which issue does this PR close?

Closes #9703

Rationale for this change

Function migration.

What changes are included in this PR?

Code and tests.

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes. The dataframe api now requires a vec![] parameter for all trim functions vs before where it seemed to be mixed between vec and not.

There was a mismatch between the sql interface, the dataframe api and the documentation around these functions. As you can see in the expr.slt file all the trim functions could be provided with a second argument which would be the string to use when trimming. However, the documentation and the dataframe api only seems to have that for btrim, not the other two functions.

This PR aligns the documentation and the dataframe with the sql so that all accept a second (optional) argument.

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions physical-expr Physical Expressions core Core DataFusion crate labels Mar 21, 2024
@Omega359 Omega359 marked this pull request as ready for review March 21, 2024 19:52
@alamb alamb added the api change Changes the API exposed to users of the crate label Mar 22, 2024
@alamb alamb changed the title Move trim functions (btrim, ltrim, rtrim) to datafusion_functions Move trim functions (btrim, ltrim, rtrim) to datafusion_functions, make expr_fn API consistent Mar 22, 2024
@alamb alamb changed the title Move trim functions (btrim, ltrim, rtrim) to datafusion_functions, make expr_fn API consistent Move trim functions (btrim, ltrim, rtrim) to datafusion_functions, make expr_fn API consistent, add string_functions Mar 22, 2024
@alamb alamb changed the title Move trim functions (btrim, ltrim, rtrim) to datafusion_functions, make expr_fn API consistent, add string_functions Move trim functions (btrim, ltrim, rtrim) to datafusion_functions, make expr_fn API consistent Mar 22, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @Omega359 -- this looks great to me

/// btrim('xyxtrimyyx', 'xyz') = 'trim'
pub fn btrim<T: OffsetSizeTrait>(args: &[ArrayRef]) -> Result<ArrayRef> {
fn btrim<T: OffsetSizeTrait>(args: &[ArrayRef]) -> Result<ArrayRef> {
general_trim::<T>(args, TrimType::Both)
}

#[derive(Debug)]
pub(super) struct TrimFunc {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't a huge deal, but I found it a little confusing that the function is called TrimFunc but has a name of "btrim" and an alias of "trim"

I would expect either BTrimFunc or else the name of "trim" and an alias of "btrim"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Urgh, this was an oversight on my part when moving the Trim function here. I'll fix.

/// `$largeUtf8Type`,
///
/// If the input type is `Utf8` or `Binary` the return type is `$utf8Type`,
macro_rules! get_optimal_return_type {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for cleaning this up

@@ -752,70 +725,6 @@ mod tests {
Int32Array
);
test_function!(BitLength, &[lit("")], Ok(Some(0)), i32, Int32, Int32Array);
test_function!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked and the functions are already tested in expr.slt (e.g. https://github.com/apache/arrow-datafusion/blob/47f4b5a67ac3b327764cbd4c0f42da7ac44854e5/datafusion/sqllogictest/test_files/expr.slt#L558-L574)

Thus removing these tests looks good to me 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually migrating that test_function for another PR (#9702) so I think what I'd like to do is to have a followup PR after that lands to readd those tests into the code. I'm not a fan of relying strictly on the sql tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I actually think the sql coverage is the most important (as it ensures everything works end to end). That being said having additional unit test coverage is not a bad thing either

@@ -919,26 +922,25 @@ lpad(str, n[, padding_str])

### `ltrim`

Removes leading spaces from a string.
Trims the specified trim string from the beginning of a string.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for correcting and rationalizing the documentation

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Omega359

@alamb alamb merged commit d321ba3 into apache:main Mar 22, 2024
25 checks passed
@Omega359 Omega359 deleted the feature/9703 branch March 24, 2024 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Physical Expressions sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move btrim, ltrim and rtrim to datafusion-functions
2 participants