-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: fix prose for extended expression #434
docs: fix prose for extended expression #434
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wjones127 added a few comments
b395e12
to
ac372ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% certain of my interpretation on any of these points so feel free to push back.
|
||
## Referred expression | ||
|
||
Extended expressions can contain one or multiple expressions. A single expression is used for filters, while multiple expressions are used for projections and aggregations. Each of these expressions is represented by an `ExpressionReference` message, which can either contain an [Expression](https://github.com/substrait-io/substrait/blob/7f272f13f22cd5f5842baea42bcf7961e6251881/proto/substrait/algebra.proto) or [AggregateFunction](https://github.com/substrait-io/substrait/blob/7f272f13f22cd5f5842baea42bcf7961e6251881/proto/substrait/algebra.proto#L1170) message. In the future, other types might be allowed to handle different use cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A single expression is used for filters, while multiple expressions are used for projections and aggregations.
This seems overly prescriptive. For example, I feel like this implies you would only use this for one of those three things. However, someone might want to use this for expression evaluation that is completely unrelated to execution plans or engines.
Maybe something along the lines of "if multiple expressions are specified then there is no implicit relationship between the different top-level expressions. Each expression should be considered independent and operates on the same top-level schema."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I can offer that as example usage, but not present as only valid use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I will also add that note about the expressions not depending on each other.
|
||
For multi expressions, user can translate them following same order as it occurs in original plan rel. But it does NOT require the consume side to handle it strictly in previous order. Only need to make sure columns in final output are organized in same order as defined in extended expression message. | ||
If there are multiple expressions, the order of the inputs and outputs matter. Input data will match the schema order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "the order of the inputs matter" mean?
I agree that the output schema must align with the expression order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to "the field order of the input and output data matters".
afcf9fd
to
f713073
Compare
f713073
to
9278d1a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. A few more minor nits.
@@ -19,33 +19,42 @@ message ExpressionReference { | |||
AggregateFunction measure = 2; | |||
} | |||
// Field names in depth-first order | |||
// | |||
// For primitive types, this will just be a single name. Nested types may | |||
// have multiple names. | |||
repeated string output_names = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a required field? The description below seems to suggest it is but it might not hurt to mention that here.
CC @yma11 did you want to confirm this description matches your understanding? |
Co-authored-by: Weston Pace <[email protected]>
You description is quite detailed and accurate for extended expression usage. Just one point: |
No progress has been made in more than six months. Closing without prejudice. |
Follow up on #405
#405 (comment)