-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-34609][SQL] Unify resolveExpressionBottomUp and resolveExpressionTopDown #31728
Conversation
* doing a bottom up traversal, it will first attempt to resolve d and fail as b has not | ||
* been resolved yet. If `throws` is false, this function will handle the exception by | ||
* returning the original attribute. In this case `d` will be resolved in subsequent passes | ||
* after `b` is resolved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc is mostly from the old resolveExpressionBottomUp
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Show resolved
Hide resolved
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #135717 has finished for PR 31728 at commit
|
resolveExpression( | ||
expr, | ||
resolveColumnByName = u => { | ||
plan.resolve(u.nameParts, resolver).orElse(resolveLiteralFunction(u.nameParts, u, plan)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we move the orElse
part into resolveExpression
? It seems it is the same between resolveExpressionByPlanOutput
and resolveExpressionByPlanChildren
.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Show resolved
Hide resolved
* Example : | ||
* SELECT a.b FROM t ORDER BY b[0].d | ||
* | ||
* In the above example, in b needs to be resolved before d can be resolved. Given we are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in b
-> b
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copied from the old code verbatimly. It's actually not accurate as b[0].d
won't fail if b
can't be resolved. It just returns the unresolved expression.
Let me rewrite this doc.
resolveColumnByOrdinal = ordinal => { | ||
val candidates = q.children.flatMap(_.output) | ||
assert(ordinal >= 0 && ordinal < candidates.length) | ||
candidates.apply(ordinal) | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does resolveExpressionTopDown
have this logic originally? Seems I cannot find it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it doesn't, but I can't find a reason why we shouldn't have it. This doesn't hurt anyway.
The only difference should be where to resolve the columns: plan output vs plan children output. So that it's easy for developers to decide which one to call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because here looks like we flatten all children outputs and let GetColumnByOrdinal
resolve to ordinal in the flattened outputs. It doesn't like any other GetColumnByOrdinal
usage so I have a question here. It works if the expr/query plan of GetColumnByOrdinal
considers the ordinal correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. Since this logic is not really needed anywhere, maybe we can use a stricter definition first. We can say that it only works if the plan has one child and we look up the attribute from output attributes of that child.
I'll update this in the followup PR (I have some other related code cleanup in mind), to avoid waiting for the QA job.
Kubernetes integration test starting |
Kubernetes integration test status failure |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #135743 has finished for PR 31728 at commit
|
Test build #135745 has finished for PR 31728 at commit
|
Test build #135749 has finished for PR 31728 at commit
|
thanks for the review, merging to master! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM2
…ionTopDown ### What changes were proposed in this pull request? It's a bit confusing to see `resolveExpressionBottomUp` and `resolveExpressionTopDown`, which provide similar functionalities but with different tree traverse order. It turns out that the real difference between these 2 methods is: which attributes should the columns be resolved to? `resolveExpressionTopDown` resolves columns using output attributes of the plan children, `resolveExpressionBottomUp` resolves columns using output attributes of the plan itself. This PR unifies `resolveExpressionBottomUp` and `resolveExpressionTopDown` and put the common logic in a new method, and let `resolveExpressionBottomUp` and `resolveExpressionTopDown` just call the new method. This PR also renames `resolveExpressionBottomUp` and `resolveExpressionTopDown` to make the difference clear. ### Why are the changes needed? code cleanup ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests Closes apache#31728 from cloud-fan/resolve. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> Cherry-picked from dc78f33 by @kbendickson. (cherry picked from commit d93c561) Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
It's a bit confusing to see
resolveExpressionBottomUp
andresolveExpressionTopDown
, which provide similar functionalities but with different tree traverse order. It turns out that the real difference between these 2 methods is: which attributes should the columns be resolved to?resolveExpressionTopDown
resolves columns using output attributes of the plan children,resolveExpressionBottomUp
resolves columns using output attributes of the plan itself.This PR unifies
resolveExpressionBottomUp
andresolveExpressionTopDown
and put the common logic in a new method, and letresolveExpressionBottomUp
andresolveExpressionTopDown
just call the new method. This PR also renamesresolveExpressionBottomUp
andresolveExpressionTopDown
to make the difference clear.Why are the changes needed?
code cleanup
Does this PR introduce any user-facing change?
no
How was this patch tested?
existing tests