[SPARK-43327] Trigger `committer.setupJob` before plan execute in `FileFormatWriter#write` #41000

zzzzming95 · 2023-04-30T07:16:54Z

What changes were proposed in this pull request?

Trigger committer.setupJob before plan execute in FileFormatWriter#write

Why are the changes needed?

In this issue, the case where outputOrdering might not work if AQE is enabled has been resolved.

However, since it materializes the AQE plan in advance (triggers getFinalPhysicalPlan) , it may cause the committer.setupJob(job) to not execute When AdaptiveSparkPlanExec#getFinalPhysicalPlan() is executed with an error.

Does this PR introduce any user-facing change?

no

How was this patch tested?

add UT

zzzzming95 · 2023-05-11T12:45:07Z

@EnricoMi @cloud-fan @dongjoon-hyun can you take a look , thanks ~

EnricoMi · 2023-05-11T13:06:16Z

What is the fallout of committer.setupJob(job) not being executed in presence of an error?

EnricoMi · 2023-05-11T13:08:15Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala

+
+    // This call shouldn't be put into the `try` block below because it only initializes and
+    // prepares the job, any exception thrown from here shouldn't cause abortJob() to be called.
+    // It must be run before `materializeAdaptiveSparkPlan()`


Maybe a similar comment above this line below would also be helpful:

val materializedPlan = materializeAdaptiveSparkPlan(empty2NullPlan)

What is the fallout of committer.setupJob(job) not being executed in presence of an error?

Spark will delete partition location when running insert overwrite .

#41000 (comment)

And it will create new location in committer.setupJob(job) , then execute the job. But in #38358 , we triggered the job execution in advance .

So when the job execute failed , the location path would be delete and no create .

EnricoMi · 2023-05-11T13:17:04Z

I think Spark 3.2 is EOL, the final patch release was 3.2.4. a month ago. So this should target branch-3.3.

Note that a similar fix went into master and branch-3.4: #39431

EnricoMi · 2023-05-11T13:20:11Z

Is this fixing #38358 (comment)?

zzzzming95 · 2023-05-11T15:37:00Z

Is this fixing #38358 (comment)?

yes

dongjoon-hyun · 2023-05-11T15:48:37Z

Hi, @zzzzming95 .
According to Apache Spark versioning policy, Apache Spark 3.2 reached EOL already and 3.2.4 was the last one. As a result, we close all PRs against branch-3.2.

https://spark.apache.org/versioning-policy.html

zzzzming95 · 2023-05-11T15:54:18Z

Hi, @zzzzming95 . According to Apache Spark versioning policy, Apache Spark 3.2 reached EOL already and 3.2.4 was the last one. As a result, we close all PRs against branch-3.2.

https://spark.apache.org/versioning-policy.html

OK , I see a similar implementation for Spark3.3, and I will submit it to Spark3.3.

dongjoon-hyun · 2023-05-11T16:40:00Z

@zzzzming95 . I'm not sure you are aware of Apache Spark backporting policy. To prevent a future regression, we start to review from master branch first. Then, backport it from master to branch-3.4 to branch-3.3.

OK , I see a similar implementation for Spark3.3, and I will submit it to Spark3.3.

SPARK-43327

996297e

github-actions bot added the SQL label Apr 30, 2023

FileFormatWriter.scala

f999ced

EnricoMi reviewed May 11, 2023

View reviewed changes

dongjoon-hyun closed this May 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-43327] Trigger `committer.setupJob` before plan execute in `FileFormatWriter#write` #41000

[SPARK-43327] Trigger `committer.setupJob` before plan execute in `FileFormatWriter#write` #41000

zzzzming95 commented Apr 30, 2023

zzzzming95 commented May 11, 2023

EnricoMi commented May 11, 2023

EnricoMi May 11, 2023

zzzzming95 May 11, 2023

EnricoMi commented May 11, 2023

EnricoMi commented May 11, 2023

zzzzming95 commented May 11, 2023

dongjoon-hyun commented May 11, 2023

zzzzming95 commented May 11, 2023

dongjoon-hyun commented May 11, 2023

[SPARK-43327] Trigger committer.setupJob before plan execute in FileFormatWriter#write #41000

[SPARK-43327] Trigger committer.setupJob before plan execute in FileFormatWriter#write #41000

Conversation

zzzzming95 commented Apr 30, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

zzzzming95 commented May 11, 2023

EnricoMi commented May 11, 2023

EnricoMi May 11, 2023

Choose a reason for hiding this comment

zzzzming95 May 11, 2023

Choose a reason for hiding this comment

EnricoMi commented May 11, 2023

EnricoMi commented May 11, 2023

zzzzming95 commented May 11, 2023

dongjoon-hyun commented May 11, 2023

zzzzming95 commented May 11, 2023

dongjoon-hyun commented May 11, 2023

[SPARK-43327] Trigger `committer.setupJob` before plan execute in `FileFormatWriter#write` #41000

[SPARK-43327] Trigger `committer.setupJob` before plan execute in `FileFormatWriter#write` #41000