Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-34609][SQL] Unify resolveExpressionBottomUp and resolveExpressionTopDown #31728

Closed
wants to merge 2 commits into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

It's a bit confusing to see resolveExpressionBottomUp and resolveExpressionTopDown, which provide similar functionalities but with different tree traverse order. It turns out that the real difference between these 2 methods is: which attributes should the columns be resolved to? resolveExpressionTopDown resolves columns using output attributes of the plan children, resolveExpressionBottomUp resolves columns using output attributes of the plan itself.

This PR unifies resolveExpressionBottomUp and resolveExpressionTopDown and put the common logic in a new method, and let resolveExpressionBottomUp and resolveExpressionTopDown just call the new method. This PR also renames resolveExpressionBottomUp and resolveExpressionTopDown to make the difference clear.

Why are the changes needed?

code cleanup

Does this PR introduce any user-facing change?

no

How was this patch tested?

existing tests

* doing a bottom up traversal, it will first attempt to resolve d and fail as b has not
* been resolved yet. If `throws` is false, this function will handle the exception by
* returning the original attribute. In this case `d` will be resolved in subsequent passes
* after `b` is resolved.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc is mostly from the old resolveExpressionBottomUp

@cloud-fan
Copy link
Contributor Author

cc @viirya @maropu @HyukjinKwon

@SparkQA
Copy link

SparkQA commented Mar 3, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40299/

@SparkQA
Copy link

SparkQA commented Mar 3, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40299/

@SparkQA
Copy link

SparkQA commented Mar 3, 2021

Test build #135717 has finished for PR 31728 at commit fef599b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

resolveExpression(
expr,
resolveColumnByName = u => {
plan.resolve(u.nameParts, resolver).orElse(resolveLiteralFunction(u.nameParts, u, plan))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move the orElse part into resolveExpression ? It seems it is the same between resolveExpressionByPlanOutput and resolveExpressionByPlanChildren .

* Example :
* SELECT a.b FROM t ORDER BY b[0].d
*
* In the above example, in b needs to be resolved before d can be resolved. Given we are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in b -> b?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied from the old code verbatimly. It's actually not accurate as b[0].d won't fail if b can't be resolved. It just returns the unresolved expression.

Let me rewrite this doc.

Comment on lines +1971 to +1975
resolveColumnByOrdinal = ordinal => {
val candidates = q.children.flatMap(_.output)
assert(ordinal >= 0 && ordinal < candidates.length)
candidates.apply(ordinal)
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does resolveExpressionTopDown have this logic originally? Seems I cannot find it.

Copy link
Contributor Author

@cloud-fan cloud-fan Mar 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it doesn't, but I can't find a reason why we shouldn't have it. This doesn't hurt anyway.

The only difference should be where to resolve the columns: plan output vs plan children output. So that it's easy for developers to decide which one to call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because here looks like we flatten all children outputs and let GetColumnByOrdinal resolve to ordinal in the flattened outputs. It doesn't like any other GetColumnByOrdinal usage so I have a question here. It works if the expr/query plan of GetColumnByOrdinal considers the ordinal correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. Since this logic is not really needed anywhere, maybe we can use a stricter definition first. We can say that it only works if the plan has one child and we look up the attribute from output attributes of that child.

I'll update this in the followup PR (I have some other related code cleanup in mind), to avoid waiting for the QA job.

@SparkQA
Copy link

SparkQA commented Mar 4, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40327/

@SparkQA
Copy link

SparkQA commented Mar 4, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40327/

@SparkQA
Copy link

SparkQA commented Mar 4, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40331/

@SparkQA
Copy link

SparkQA commented Mar 4, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40331/

@SparkQA
Copy link

SparkQA commented Mar 4, 2021

Test build #135743 has finished for PR 31728 at commit 8c1697c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 4, 2021

Test build #135745 has finished for PR 31728 at commit 8332a43.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 4, 2021

Test build #135749 has finished for PR 31728 at commit 7c2de54.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

thanks for the review, merging to master!

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM2

flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
…ionTopDown

### What changes were proposed in this pull request?

It's a bit confusing to see `resolveExpressionBottomUp` and `resolveExpressionTopDown`, which provide similar functionalities but with different tree traverse order. It turns out that the real difference between these 2 methods is: which attributes should the columns be resolved to? `resolveExpressionTopDown` resolves columns using output attributes of the plan children, `resolveExpressionBottomUp` resolves columns using output attributes of the plan itself.

This PR unifies `resolveExpressionBottomUp` and `resolveExpressionTopDown` and put the common logic in a new method, and let `resolveExpressionBottomUp` and `resolveExpressionTopDown` just call the new method. This PR also renames `resolveExpressionBottomUp` and `resolveExpressionTopDown` to make the difference clear.

### Why are the changes needed?

code cleanup

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

Closes apache#31728 from cloud-fan/resolve.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>

Cherry-picked from dc78f33 by @kbendickson.

(cherry picked from commit d93c561)
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants