-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support reading Substrait plans with Window Functions #165
Conversation
BREAKING CHANGE: various public functions that took the AggregateFunction.AggregationInvocation proto now take the POJO equivalent Expression.AggregationInvocation.
bound = | ||
Expression.WindowFunction.Bound.newBuilder() | ||
.setUnbounded(unboundedProto) | ||
.setPreceding(preceding) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was problematic for two reasons:
unbounded
andproceeding
are both part of the same one of, so both should not be set.- The default offset for
preceding
is set to 0 here, which is dissalowed per the spec (offset must be a positive integer). I noticed this issue when I added theLiteralToWindowBoundOffset
converter which checked for the positivity of the offset.
var offset = boundedWindowBound.offset(); | ||
boolean isPreceding = boundedWindowBound.direction() == WindowBound.Direction.PRECEDING; | ||
io.substrait.expression.Expression.I32Literal offsetLiteral = | ||
(io.substrait.expression.Expression.I32Literal) offset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cast of the didn't always succeed. Reading plans in from protos, the offset is encoded as a long which is converted to an I64Literal. Additionally, depending on how Calcite encodes the offset as well, these could potentially be I8 or I16. The possibilities are handled in the LiteralToWindowBoundOffset visitor.
d95c04f
to
7d0ed4b
Compare
.partitionBy(partitionExprs) | ||
.orderBy(sortFields) | ||
.lowerBound(lowerBound) | ||
.upperBound(upperBound) | ||
.build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that WindowFunctionInvocation
and Window
both share so many of the same fields seem like a bit of a code smell. I think it's possible to reduce the duplication, but decided to leave that off for a different change.
7d0ed4b
to
6ed43ef
Compare
6ed43ef
to
9c65d6b
Compare
} | ||
|
||
@Value.Immutable | ||
abstract static class BoundedWindowBound implements WindowBound { | ||
|
||
@Override | ||
public BoundedKind boundedKind() { | ||
return BoundedKind.UNBOUNDED; | ||
return BoundedKind.BOUNDED; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an existing bug which caused round-trip failures.
@@ -420,7 +421,8 @@ public Expression visit(io.substrait.expression.Expression.Window expr) throws R | |||
expr.partitionBy().stream() | |||
.map(e -> e.accept(this)) | |||
.collect(java.util.stream.Collectors.toList()); | |||
var builder = Expression.WindowFunction.newBuilder(); | |||
var outputType = expr.getType().accept(typeProtoConverter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The missing output type caused round-trip failures.
The changes in this PR allows substrait-java to read in Substrait plans containing window functions (the existing machinary only allowed for generating these plans). Various roundtrip tests are included in WindowFunctionTest. These tests helped my identify some bugs in the existing code to convert POJO window bounds to PROTO window bounds. I've simplified the handling of these as well. |
functionVariant = lookup.getWindowFunction(functionReference, extensions); | ||
} catch (RuntimeException e) { | ||
// TODO: Ideally we shouldn't need to catch a RuntimeException to be able to attempt our | ||
// second lookup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not as part of this PR, but I think it would be good to move substrait-java and isthmus away from throwing RuntimeExceptions and towards a hierarchy of checked exceptions to improve library economics.
@jinfengni , can you please review? |
Note: this work depends on the refactor in #164 |
PR to add this behaviour to the spec is substrait-io/substrait#540 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
35b276b
to
033ff03
Compare
…t-io#165) * fix: incorrect BoundedKind for BoundedWindowBound * feat: add mapping for sum0 aggregate function * feat(pojos): convenience methods for working with Window Bounds
No description provided.