[SPARK-24552][CORE][SQL][BRANCH-2.3] Use unique id instead of attempt…

… number for writes . This passes a unique attempt id instead of attempt number to v2 data sources and hadoop APIs, because attempt number is reused when stages are retried. When attempt numbers are reused, sources that track data by partition id and attempt number may incorrectly clean up data because the same attempt number can be both committed and aborted. Author: Marcelo Vanzin <[email protected]> Closes apache#21615 from vanzin/SPARK-24552-2.3.
jzhuge · Aug 13, 2018 · f05e0e8 · f05e0e8
1 parent de3e790
commit f05e0e8
Showing 1 changed file with 2 additions and 2 deletions.
diff --git a/...re/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala b/...re/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala
@@ -98,8 +98,8 @@ object DataWritingSparkTask extends Logging {
       useCommitCoordinator: Boolean): WriterCommitMessage = {
     val stageId = context.stageId()
     val partId = context.partitionId()
-    val attemptId = context.attemptNumber()
-    val dataWriter = writeTask.createDataWriter(partId, attemptId)
+    val attemptId = context.taskAttemptId().toInt // see SPARK-24552
+    val dataWriter = writeTask.createDataWriter(context.partitionId(), attemptId)
 
     // write the data and commit this writer.
     Utils.tryWithSafeFinallyAndFailureCallbacks(block = {