Skip to content

Commit

Permalink
[SPARK-24552][CORE][SQL][BRANCH-2.3] Use unique id instead of attempt…
Browse files Browse the repository at this point in the history
… number for writes .

This passes a unique attempt id instead of attempt number to v2
data sources and hadoop APIs, because attempt number is reused
when stages are retried. When attempt numbers are reused, sources
that track data by partition id and attempt number may incorrectly
clean up data because the same attempt number can be both committed
and aborted.

Author: Marcelo Vanzin <[email protected]>

Closes apache#21615 from vanzin/SPARK-24552-2.3.
  • Loading branch information
Marcelo Vanzin authored and rdblue committed Aug 13, 2018
1 parent de3e790 commit f05e0e8
Showing 1 changed file with 2 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -98,8 +98,8 @@ object DataWritingSparkTask extends Logging {
useCommitCoordinator: Boolean): WriterCommitMessage = {
val stageId = context.stageId()
val partId = context.partitionId()
val attemptId = context.attemptNumber()
val dataWriter = writeTask.createDataWriter(partId, attemptId)
val attemptId = context.taskAttemptId().toInt // see SPARK-24552
val dataWriter = writeTask.createDataWriter(context.partitionId(), attemptId)

// write the data and commit this writer.
Utils.tryWithSafeFinallyAndFailureCallbacks(block = {
Expand Down

0 comments on commit f05e0e8

Please sign in to comment.