-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-19256][SQL] Remove ordering enforcement from FileFormatWriter
and let planner do that
#20206
Conversation
…r` into `RunnableCommand`
Test build #85859 has finished for PR 20206 at commit
|
Test build #85891 has finished for PR 20206 at commit
|
Test build #85936 has finished for PR 20206 at commit
|
Jenkins retest this please |
Test build #85946 has finished for PR 20206 at commit
|
cc @cloud-fan @gengliangwang for review |
} else { | ||
// SPARK-21165: the `requiredOrdering` is based on the attributes from analyzed plan, and | ||
// the physical plan may have different attribute ids due to optimizer removing some | ||
// aliases. Here we bind the expression ahead to avoid potential attribute ids mismatch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This concern is still valid, the DataWritingCommand.requiredChildOrdering
is based on logical plan's output attribute ids, how can we safely apply it in DataWritingCommandExec
?
* | ||
* table type | requiredOrdering | ||
* -----------------+------------------------------------------------- | ||
* normal table | partition columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: non-bucketed table
, a partitioned table is not a normal table...
@@ -150,6 +152,10 @@ case class InsertIntoHadoopFsRelationCommand( | |||
} | |||
} | |||
|
|||
val partitionSet = AttributeSet(partitionColumns) | |||
val dataColumns = query.output.filterNot(partitionSet.contains) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use outputColumns
instead of query.output
, cc @gengliangwang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, it should be outputColumns
here, which is the output columns of analyzed plan. See #20020 for details.
Hi all, any update here? |
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
This is as per discussion in #19483 (comment) . Enforcing
Sort
at right places in the tree is something thatEnsureRequirements
should take care of. This PR removesSORT
node insertion done insideFileFormatWriter
.How was this patch tested?
Sort
was added in the plan.