Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: fix prose for extended expression #434

Closed

Conversation

wjones127
Copy link
Contributor

Follow up on #405

#405 (comment)

Copy link
Contributor

@vibhatha vibhatha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wjones127 added a few comments

@wjones127 wjones127 force-pushed the docs-extended-expressions-grammar branch from b395e12 to ac372ed Compare February 2, 2023 18:36
@wjones127 wjones127 marked this pull request as ready for review February 2, 2023 18:58
@wjones127 wjones127 requested a review from vibhatha February 2, 2023 18:59
Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% certain of my interpretation on any of these points so feel free to push back.

proto/substrait/extended_expression.proto Outdated Show resolved Hide resolved
proto/substrait/extended_expression.proto Outdated Show resolved Hide resolved
proto/substrait/extended_expression.proto Outdated Show resolved Hide resolved
proto/substrait/extended_expression.proto Outdated Show resolved Hide resolved
site/docs/expressions/extended_expression.md Outdated Show resolved Hide resolved

## Referred expression

Extended expressions can contain one or multiple expressions. A single expression is used for filters, while multiple expressions are used for projections and aggregations. Each of these expressions is represented by an `ExpressionReference` message, which can either contain an [Expression](https://github.com/substrait-io/substrait/blob/7f272f13f22cd5f5842baea42bcf7961e6251881/proto/substrait/algebra.proto) or [AggregateFunction](https://github.com/substrait-io/substrait/blob/7f272f13f22cd5f5842baea42bcf7961e6251881/proto/substrait/algebra.proto#L1170) message. In the future, other types might be allowed to handle different use cases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single expression is used for filters, while multiple expressions are used for projections and aggregations.

This seems overly prescriptive. For example, I feel like this implies you would only use this for one of those three things. However, someone might want to use this for expression evaluation that is completely unrelated to execution plans or engines.

Maybe something along the lines of "if multiple expressions are specified then there is no implicit relationship between the different top-level expressions. Each expression should be considered independent and operates on the same top-level schema."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I can offer that as example usage, but not present as only valid use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I will also add that note about the expressions not depending on each other.


For multi expressions, user can translate them following same order as it occurs in original plan rel. But it does NOT require the consume side to handle it strictly in previous order. Only need to make sure columns in final output are organized in same order as defined in extended expression message.
If there are multiple expressions, the order of the inputs and outputs matter. Input data will match the schema order
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "the order of the inputs matter" mean?

I agree that the output schema must align with the expression order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to "the field order of the input and output data matters".

@wjones127 wjones127 force-pushed the docs-extended-expressions-grammar branch from afcf9fd to f713073 Compare February 6, 2023 17:32
@wjones127 wjones127 force-pushed the docs-extended-expressions-grammar branch from f713073 to 9278d1a Compare February 9, 2023 19:32
@wjones127 wjones127 requested review from westonpace and vibhatha and removed request for vibhatha and westonpace February 9, 2023 19:32
Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. A few more minor nits.

proto/substrait/extended_expression.proto Outdated Show resolved Hide resolved
@@ -19,33 +19,42 @@ message ExpressionReference {
AggregateFunction measure = 2;
}
// Field names in depth-first order
//
// For primitive types, this will just be a single name. Nested types may
// have multiple names.
repeated string output_names = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a required field? The description below seems to suggest it is but it might not hurt to mention that here.

site/docs/expressions/extended_expression.md Outdated Show resolved Hide resolved
@westonpace
Copy link
Member

CC @yma11 did you want to confirm this description matches your understanding?

@yma11
Copy link
Contributor

yma11 commented Feb 15, 2023

CC @yma11 did you want to confirm this description matches your understanding?

You description is quite detailed and accurate for extended expression usage. Just one point: If multiple expressions are specified then there is no implicit relationship between the different top-level expressions. -> If multiple expressions are specified then there is no implicit relationship between them?

@EpsilonPrime EpsilonPrime added the awaiting-user-input This issue is waiting on further input from users label Aug 16, 2023
@jacques-n
Copy link
Contributor

No progress has been made in more than six months. Closing without prejudice.

@jacques-n jacques-n closed this Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-user-input This issue is waiting on further input from users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants