Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support reading Substrait plans with Window Functions #165

Merged
merged 7 commits into from
Aug 17, 2023

Conversation

vbarua
Copy link
Member

@vbarua vbarua commented Aug 10, 2023

No description provided.

vbarua added 2 commits August 9, 2023 13:47
BREAKING CHANGE: various public functions that took the
AggregateFunction.AggregationInvocation proto now take the POJO
equivalent Expression.AggregationInvocation.
bound =
Expression.WindowFunction.Bound.newBuilder()
.setUnbounded(unboundedProto)
.setPreceding(preceding)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was problematic for two reasons:

  1. unbounded and proceeding are both part of the same one of, so both should not be set.
  2. The default offset for preceding is set to 0 here, which is dissalowed per the spec (offset must be a positive integer). I noticed this issue when I added the LiteralToWindowBoundOffset converter which checked for the positivity of the offset.

var offset = boundedWindowBound.offset();
boolean isPreceding = boundedWindowBound.direction() == WindowBound.Direction.PRECEDING;
io.substrait.expression.Expression.I32Literal offsetLiteral =
(io.substrait.expression.Expression.I32Literal) offset;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cast of the didn't always succeed. Reading plans in from protos, the offset is encoded as a long which is converted to an I64Literal. Additionally, depending on how Calcite encodes the offset as well, these could potentially be I8 or I16. The possibilities are handled in the LiteralToWindowBoundOffset visitor.

@vbarua vbarua force-pushed the vbarua/window-function-support branch from d95c04f to 7d0ed4b Compare August 10, 2023 22:39
.partitionBy(partitionExprs)
.orderBy(sortFields)
.lowerBound(lowerBound)
.upperBound(upperBound)
.build();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that WindowFunctionInvocation and Window both share so many of the same fields seem like a bit of a code smell. I think it's possible to reduce the duplication, but decided to leave that off for a different change.

@vbarua vbarua force-pushed the vbarua/window-function-support branch from 7d0ed4b to 6ed43ef Compare August 10, 2023 22:56
@vbarua vbarua force-pushed the vbarua/window-function-support branch from 6ed43ef to 9c65d6b Compare August 10, 2023 22:58
@vbarua vbarua marked this pull request as ready for review August 10, 2023 23:00
@vbarua vbarua requested a review from jacques-n August 10, 2023 23:00
}

@Value.Immutable
abstract static class BoundedWindowBound implements WindowBound {

@Override
public BoundedKind boundedKind() {
return BoundedKind.UNBOUNDED;
return BoundedKind.BOUNDED;
}
Copy link
Member Author

@vbarua vbarua Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an existing bug which caused round-trip failures.

@@ -420,7 +421,8 @@ public Expression visit(io.substrait.expression.Expression.Window expr) throws R
expr.partitionBy().stream()
.map(e -> e.accept(this))
.collect(java.util.stream.Collectors.toList());
var builder = Expression.WindowFunction.newBuilder();
var outputType = expr.getType().accept(typeProtoConverter);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The missing output type caused round-trip failures.

@vbarua
Copy link
Member Author

vbarua commented Aug 10, 2023

The changes in this PR allows substrait-java to read in Substrait plans containing window functions (the existing machinary only allowed for generating these plans).

Various roundtrip tests are included in WindowFunctionTest. These tests helped my identify some bugs in the existing code to convert POJO window bounds to PROTO window bounds. I've simplified the handling of these as well.

functionVariant = lookup.getWindowFunction(functionReference, extensions);
} catch (RuntimeException e) {
// TODO: Ideally we shouldn't need to catch a RuntimeException to be able to attempt our
// second lookup
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not as part of this PR, but I think it would be good to move substrait-java and isthmus away from throwing RuntimeExceptions and towards a hierarchy of checked exceptions to improve library economics.

@jacques-n
Copy link
Collaborator

@jinfengni , can you please review?

@vbarua
Copy link
Member Author

vbarua commented Aug 11, 2023

Note: this work depends on the refactor in #164

@vbarua
Copy link
Member Author

vbarua commented Aug 14, 2023

PR to add this behaviour to the spec is substrait-io/substrait#540

Copy link
Member

@EpsilonPrime EpsilonPrime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Base automatically changed from vbarua/pojo-for-aggregation-phase to main August 14, 2023 23:35
@vbarua vbarua force-pushed the vbarua/window-function-support branch from 35b276b to 033ff03 Compare August 17, 2023 20:16
@vbarua vbarua merged commit 93c6db5 into main Aug 17, 2023
@vbarua vbarua deleted the vbarua/window-function-support branch August 17, 2023 20:43
ajegou pushed a commit to ajegou/substrait-java that referenced this pull request Mar 29, 2024
…t-io#165)

* fix: incorrect BoundedKind for BoundedWindowBound
* feat: add mapping for sum0 aggregate function
* feat(pojos): convenience methods for working with Window Bounds
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants