Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AUDIT][SPARK-40086] Changes to physical plan class hierarchy to reduce redundant shuffles and sorts #7727

Open
andygrove opened this issue Feb 10, 2023 · 2 comments
Labels
audit_3.4.0 Audit related tasks for 3.4.0 feature request New feature or request performance A performance related task/issue

Comments

@andygrove
Copy link
Contributor

Is your feature request related to a problem? Please describe.
In apache/spark@b122436564 a number of physical operators have changes to the traits they implement (aggregates, projections, limits, TakeOrderedAndProjectExec).

They now extend OrderPreservingUnaryExecNode and PartitioningPreservingUnaryExecNode and this can avoid redundant shuffles and sorts.

Describe the solution you'd like
We likely need to make similar changes if we want to get the same performance improvements.

Describe alternatives you've considered

Additional context

@andygrove andygrove added feature request New feature or request ? - Needs Triage Need team to review and classify audit_3.4.0 Audit related tasks for 3.4.0 labels Feb 10, 2023
@mattahrens mattahrens added performance A performance related task/issue and removed ? - Needs Triage Need team to review and classify labels Feb 14, 2023
@mattahrens
Copy link
Collaborator

Similar to #7501

@mythrocks
Copy link
Collaborator

Tacking on apache/spark#40137, because it's related. After AliasAwareOutputExpression is supported, it would be good to verify that partitioning/ordering clauses from non-existent attributes does not affect the output.

The (possibly contrived) test was added in this commit:
apache/spark@149458c50d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
audit_3.4.0 Audit related tasks for 3.4.0 feature request New feature or request performance A performance related task/issue
Projects
None yet
Development

No branches or pull requests

3 participants