Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert BuiltInWindowFunction::{Lead, Lag} to a user defined window function #12857

Merged
merged 79 commits into from
Oct 18, 2024

Conversation

jcsherin
Copy link
Contributor

@jcsherin jcsherin commented Oct 10, 2024

Which issue does this PR close?

Closes #12802.

Rationale for this change

Same as #12802.

What changes are included in this PR?

  • Converts lead and lag to user-defined window functions.
  • Adds support for serializing logical plan for window expressions with more than 1 argument.

Are these changes tested?

  • Adds logical plan roundtrip tests.

Are there any user-facing changes?

Yes.

@alamb alamb added the api change Changes the API exposed to users of the crate label Oct 15, 2024
@jcsherin
Copy link
Contributor Author

Thanks @alamb 🙏

@alamb
Copy link
Contributor

alamb commented Oct 16, 2024

I merged this branch up to resolve some conflicts (due to #12893)

@jcsherin
Copy link
Contributor Author

@alamb Before we merge this PR, I think it will be better to merge the bug fix done in BuiltInWindowFunction::{Lead,Lag} here:

I can port the bug fix over to this same PR after #12811 is merged. After that we can merge this to main.

Is that ok?

@alamb
Copy link
Contributor

alamb commented Oct 16, 2024

@alamb Before we merge this PR, I think it will be better to merge the bug fix done in BuiltInWindowFunction::{Lead,Lag} here:

I can port the bug fix over to this same PR after #12811 is merged. After that we can merge this to main.

Is that ok?

Sounds like a plan. I just merged #12811 -- just let me know when this PR is ready

@alamb alamb marked this pull request as draft October 16, 2024 16:19
@alamb
Copy link
Contributor

alamb commented Oct 16, 2024

Marking as draft so we don't accidentally merge it before the fixes are complete

@alamb alamb marked this pull request as ready for review October 16, 2024 18:31
@alamb alamb marked this pull request as draft October 16, 2024 18:31
@alamb
Copy link
Contributor

alamb commented Oct 16, 2024

CI tests are still failing it seems

Comment on lines 184 to 188
// See https://github.com/apache/datafusion/pull/12811
let (_expr, return_type) = rewrite_null_expr_and_data_type(
partition_evaluator_args.input_exprs(),
return_type,
)?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above bug fix in BuiltInWindowFunction:{Lead, Lag} rewrites the NULL expression to match the type of default value (if provided).

Right now this feature for overwriting arguments to user-defined window function is missing in WindowUDF. So for the moment this fix cannot be applied and the newly added regression sqllogictests will fail.

I'll have to look into bridging the gap between WindowUDF and BuiltInWindowFunction in this case.

@jcsherin
Copy link
Contributor Author

@alamb

CI tests are still failing it seems

The bug fix relies on rewriting the NULL expression passed as argument to lead or lag to match the data type of the default value.

Right now this feature is missing in WindowUDF. To be specific, the expressions method in BuiltInWindowFunctionExpr:

fn expressions(&self) -> Vec<Arc<dyn PhysicalExpr>> {
vec![Arc::clone(&self.expr)]
}

I think we need to add a similar API in WindowUDF to complete this bugfix.

For now I have disabled the new sqllogictests and marked them as TODO in 270a203.

I'll do a follow on PR to add this feature and squash the bug (again). Hope that is alright?

@alamb
Copy link
Contributor

alamb commented Oct 16, 2024

I'll do a follow on PR to add this feature and squash the bug (again). Hope that is alright?

Yes, thank you

@alamb
Copy link
Contributor

alamb commented Oct 16, 2024

Or maybe you could just make the change in this PR - the downside of merging this is that it reintroduces a bug (temporarily). I think given we are planning the 43 release #12470 in the next few days it would be nice to avoid breaking a previous bug fix

@jcsherin
Copy link
Contributor Author

That works. Will keep this PR as draft until the bug fix is complete. Thanks 👍

@jcsherin
Copy link
Contributor Author

  1. Port the bug fix for handling NULL input in 814a9b7. Made it into a single commit to make it easier to see the isolated diff for the bug fix.
  2. Added a few more test cases in c4b8840
  3. Added WindowUDFImpl::expressions. It has a default implementation so does not break existing user-defined window function implementations.

@alamb This is ready for review again.

@alamb alamb marked this pull request as ready for review October 17, 2024 20:41
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me -- thank you (again) @jcsherin

use datafusion_physical_expr_common::physical_expr::PhysicalExpr;
use std::sync::Arc;

/// Arguments passed to user-defined window function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb merged commit efe5708 into apache:main Oct 18, 2024
26 checks passed
@jcsherin jcsherin deleted the convert-lead-lag-udwf branch October 18, 2024 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Physical Expressions proto Related to proto crate sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convert BuiltInWindowFunction::{Lead, Lag} to a user defined window function
2 participants