[SPARK-23040][CORE]: Returns interruptible iterator for shuffle reader #20449

advancedxy · 2018-01-31T06:31:43Z

What changes were proposed in this pull request?

Before this commit, a non-interruptible iterator is returned if aggregator or ordering is specified.
This commit also ensures that sorter is closed even when task is cancelled(killed) in the middle of sorting.

How was this patch tested?

Add a unit test in JobCancellationSuite

Before this commit, a non-interruptible iterator is returned if aggregator or ordering is specified.

advancedxy · 2018-01-31T11:29:10Z

ping @cloud-fan

cloud-fan · 2018-01-31T13:30:55Z

core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala

+        context.addTaskCompletionListener(tc => {
+          // Note: we only stop sorter if cancelled as sorter.stop wouldn't be called in
+          // CompletionIterator. Another way would be making sorter.stop idempotent.
+          if (tc.isInterrupted()) { sorter.stop() }


seems we can remove this if if we don't return a CompletionIterator.

BTW I think we need to check all the places that use CompletionIterator, to see if they consider job canceling.

One advantage of CompletionIterator is that the completionFunction will be called as soon as the wrapped iterator is consumed. So for sorter, it will release memory earlier rather than at task completion.

As for job cancelling, It's not just CompletionIterator that we should consider. The combiner and sorter pattern(or similar) is something we should look for:

combiner.insertAll(iterator) // or sorter.insertAll(iterator) // then returns new iterator combiner.iterator // or sorter.iterator

I may be missing something obvious, but seems ExternalSorter.stop() is already idempotent?

I may be missing something obvious, but seems ExternalSorter.stop() is already idempotent?

Ah, yes. After another look, it's indeed idempotent.
Will update the code.

cloud-fan · 2018-01-31T13:32:07Z

cc @jiangxb1987

advancedxy · 2018-02-05T06:14:46Z

ping @cloud-fan and @jiangxb1987.

jerryshao · 2018-02-08T08:54:29Z

@advancedxy did you see any issue or exception regarding to this issue?

advancedxy · 2018-02-08T09:00:19Z

Hi, @jerryshao I didn't see exception. But the issue is:
When the stage is abort and all the remaining tasks are killed, those tasks are not cancelled but rather continue running which is a waste of executor resource.

jerryshao · 2018-02-08T09:04:37Z

I understood your intention. I was wondering do we actually meet this issue in production envs, or do you have a minimal reproduce code?

advancedxy · 2018-02-08T09:18:29Z

I was wondering do we actually meet this issue in production envs,

@jerryshao I met this issue in our production when I was debugging a Spark job. I noticed the aborted stage's task continues running until finishes.

I cannot give a minimal reproduce code since the failure is related to our mixed(online and offline services) hosts. But you can have a look at the test case I added, it essentially captures the transformation I used except the async part.

Currently, I wrap the user defined iterator under Iterruptible Iterator. However I believe it's better handled on Spark side.

jerryshao · 2018-02-08T09:24:03Z

I see. Thanks.

jerryshao · 2018-02-08T09:32:05Z

core/src/test/scala/org/apache/spark/JobCancellationSuite.scala

+      .mapPartitions { iter =>
+        taskStartedSemaphore.release()
+        // Small delay to ensure that foreach is cancelled if task is killed
+        Thread.sleep(1000)


I think using sleep will make the UT flaky, you'd better changing to some deterministic ways.

advancedxy · 2018-02-09T06:39:02Z

@jerryshao @cloud-fan I have updated my code. Do you have any other concerns?

cloud-fan · 2018-02-09T07:51:37Z

core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala

@@ -104,9 +104,16 @@ private[spark] class BlockStoreShuffleReader[K, C](
        context.taskMetrics().incMemoryBytesSpilled(sorter.memoryBytesSpilled)
        context.taskMetrics().incDiskBytesSpilled(sorter.diskBytesSpilled)
        context.taskMetrics().incPeakExecutionMemory(sorter.peakMemoryUsedBytes)
+        // Use completion callback to stop sorter if task was cancelled.


if task is completed(either finished or canceled)

advancedxy

@cloud-fan Sorry for the delay.

Your comment is addressed.

advancedxy · 2018-02-25T15:51:13Z

core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala

@@ -104,9 +104,16 @@ private[spark] class BlockStoreShuffleReader[K, C](
        context.taskMetrics().incMemoryBytesSpilled(sorter.memoryBytesSpilled)
        context.taskMetrics().incDiskBytesSpilled(sorter.diskBytesSpilled)
        context.taskMetrics().incPeakExecutionMemory(sorter.peakMemoryUsedBytes)
+        // Use completion callback to stop sorter if task was completed(either finished/cancelled).


To fit the 100 chars limitation, or is replaced by /

then we can just write if task was finished/cancelled.

cloud-fan · 2018-02-26T06:16:20Z

core/src/test/scala/org/apache/spark/JobCancellationSuite.scala

 import org.scalatest.BeforeAndAfter
 import org.scalatest.Matchers
-


this will break the style check

cloud-fan · 2018-02-26T06:17:20Z

core/src/test/scala/org/apache/spark/JobCancellationSuite.scala

+        taskStartedSemaphore.release()
+        iter
+      }.foreachAsync { x =>
+        if ( x._1 >= 10) { // this block of code is partially executed.


no space after if(

cloud-fan · 2018-02-26T06:21:02Z

core/src/test/scala/org/apache/spark/JobCancellationSuite.scala

@@ -320,6 +319,55 @@ class JobCancellationSuite extends SparkFunSuite with Matchers with BeforeAndAft
    f2.get()
  }

+  test("Interruptible iterator of shuffle reader") {


can we briefly explain what happened in this test?

advancedxy · 2018-02-26T12:27:00Z

@cloud-fan I have update the comments and fixed style issues(previously was auto formatted by IntelliJ)

cloud-fan · 2018-02-27T05:36:53Z

ok to test

cloud-fan · 2018-02-27T05:44:04Z

core/src/test/scala/org/apache/spark/JobCancellationSuite.scala

+    val f = sc.parallelize(1 to 1000, numSlice).map { i => (i, i) }
+      .repartitionAndSortWithinPartitions(new HashPartitioner(2))
+      .mapPartitions { iter =>
+        taskStartedSemaphore.release()


This will be called twice as the root RDD has 2 partitions, so f.cancel might be called before both of these 2 partitions finished.

f.cancel() should be called before these partitions(tasks) finishing , and we want to make sure these tasks could be cancelled

cloud-fan · 2018-02-27T05:46:18Z

core/src/test/scala/org/apache/spark/JobCancellationSuite.scala

+    val taskCompletedSem = new Semaphore(0)
+    Future {
+      taskStartedSemaphore.acquire()
+      f.cancel()


what's the expectation for when this f.cancel() should be called?

Line 372: sem.acquire() is blocked by this Future block, but it looks we don't need Future or sem here. I will update the code.

SparkQA · 2018-02-27T08:05:01Z

Test build #87703 has finished for PR 20449 at commit ba2f355.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-02-27T08:05:02Z

Test build #87701 has finished for PR 20449 at commit 88e86e0.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-02-27T08:13:49Z

core/src/test/scala/org/apache/spark/JobCancellationSuite.scala

+    taskStartedSemaphore.acquire()
+    f.cancel()
+
+    val e = intercept[SparkException] { f.get() }.getCause


nit: intercept[SparkException](f.get()).getCause

cloud-fan · 2018-02-27T08:14:39Z

core/src/test/scala/org/apache/spark/JobCancellationSuite.scala

+    })
+
+    taskStartedSemaphore.acquire()
+    f.cancel()


We should add some comment to explain when we reach here. From what I am seeing:

taskStartedSemaphore.release() must be called, so at least one task is started.

the first task has processed no more than 10 records, the second task hasn't processed any data, because the reduce stage is not finished and taskCancelledSemaphore.acquire() will be blocked.

cloud-fan · 2018-02-27T08:18:14Z

core/src/test/scala/org/apache/spark/JobCancellationSuite.scala

+      }
+    })
+
+    taskStartedSemaphore.acquire()


why not taskStartedSemaphore.acquire(numSlice)?

As soon as one task starts, we can cancel the job.

SparkQA · 2018-02-27T19:23:05Z

Test build #87723 has finished for PR 20449 at commit d6ed9a1.

This patch fails from timeout after a configured wait of `300m`.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-02-27T19:38:05Z

Test build #87725 has finished for PR 20449 at commit 8c15c56.

This patch fails from timeout after a configured wait of `300m`.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-02-28T05:43:47Z

core/src/test/scala/org/apache/spark/JobCancellationSuite.scala

+    // execution and a counter is used to make sure that the corresponding tasks are indeed
+    // cancelled.
+    import JobCancellationSuite._
+    val numSlice = 1


can we hardcode it? using a variable makes people feel like they can change its value and the test can still pass, however it's not true as assert(executionOfInterruptibleCounter.get() <= 10) needs to be updated too.

Will update it later.

But looks like Jenkins are having troubles there days? it it back to normal?

I'm not sure, let's just try it :)

cloud-fan · 2018-02-28T05:47:16Z

LGTM

cloud-fan · 2018-02-28T06:35:29Z

retest this please

SparkQA · 2018-02-28T08:05:01Z

Test build #87756 has finished for PR 20449 at commit 8c15c56.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

tests Move addSparkListener to beginning of test

advancedxy · 2018-03-01T07:58:34Z

I'm not sure, let's just try it :)

All right, I finally tracked down why it's hanging on Jenkins.
The global semaphores used by interruptible iterator of shuffle reader are interfered by other tasks.

Please check the latest change, @cloud-fan

SparkQA · 2018-03-01T08:07:58Z

Test build #87813 has finished for PR 20449 at commit 756e0b7.

This patch fails from timeout after a configured wait of `300m`.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-03-01T08:12:17Z

core/src/test/scala/org/apache/spark/JobCancellationSuite.scala

+      // Reset semaphores if used by multiple tests.
+      // Note: if other semaphores are shared by multiple tests, please reset them in this block
+      JobCancellationSuite.taskStartedSemaphore.drainPermits()
+      JobCancellationSuite.taskCancelledSemaphore.drainPermits()


nit: for simplicity, I'd like to reset all semaphores here, instead of thinking about which one are shared.

or we can make all semaphores local, so that we don't need to care about it.

for simplicity, I'd like to reset all semaphores here, instead of thinking about which one are shared.

Another way to avoid this problem is: don't reuse semaphores. But that's too verbose.

As for your suggestion, if new semaphores are added by others, how could he know that he's supposed to reset the semaphores? Maybe some comments are needed in semaphore declaration

or we can make all semaphores local, so that we don't need to care about it.

No, Global semaphore is required when being shared between driver and executor(another thread in local mode).
See related pr #4180 for details

Maybe some comments are needed in semaphore declaration

+1. It's also good for reviewers, otherwise figuring out a semaphore is shared or not is really unnecessary for reviewers.

SparkQA · 2018-03-01T11:47:24Z

Test build #87822 has finished for PR 20449 at commit a3d8ad5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-03-01T19:51:54Z

Test build #87845 has finished for PR 20449 at commit 28119e9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

advancedxy · 2018-03-04T05:46:32Z

ping @cloud-fan

cloud-fan · 2018-03-05T22:57:42Z

thanks, merging to master!

advancedxy · 2018-03-06T05:14:56Z

@cloud-fan is it possible that we also merge this into branch-2.3, so this fix could be released in the Spark-2.3.1?

cloud-fan · 2018-03-06T05:38:32Z

core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala

        CompletionIterator[Product2[K, C], Iterator[Product2[K, C]]](sorter.iterator, sorter.stop())
      case None =>
        aggregatedIter
    }
+    // Use another interruptible iterator here to support task cancellation as aggregator or(and)
+    // sorter may have consumed previous interruptible iterator.
+    new InterruptibleIterator[Product2[K, C]](context, resultIter)


there is a chance that resultIter is already an InterruptibleIterator, and we should not double wrap it. Can you send a followup PR to fix this? then we can backport them to 2.3 together.

## What changes were proposed in this pull request? Address apache#20449 (comment), If `resultIter` is already a `InterruptibleIterator`, don't double wrap it. ## How was this patch tested? Existing tests. Author: Xingbo Jiang <[email protected]> Closes apache#20920 from jiangxb1987/SPARK-23040.

## What changes were proposed in this pull request? Before this commit, a non-interruptible iterator is returned if aggregator or ordering is specified. This commit also ensures that sorter is closed even when task is cancelled(killed) in the middle of sorting. ## How was this patch tested? Add a unit test in JobCancellationSuite Author: Xianjin YE <[email protected]> Closes apache#20449 from advancedxy/SPARK-23040.

## What changes were proposed in this pull request? Address apache#20449 (comment), If `resultIter` is already a `InterruptibleIterator`, don't double wrap it. ## How was this patch tested? Existing tests. Author: Xingbo Jiang <[email protected]> Closes apache#20920 from jiangxb1987/SPARK-23040.

…fle reader Backport #20449 and #20920 to branch-2.3 --- ## What changes were proposed in this pull request? Before this commit, a non-interruptible iterator is returned if aggregator or ordering is specified. This commit also ensures that sorter is closed even when task is cancelled(killed) in the middle of sorting. ## How was this patch tested? Add a unit test in JobCancellationSuite Author: Xianjin YE <[email protected]> Author: Xingbo Jiang <[email protected]> Closes #20954 from jiangxb1987/SPARK-23040-2.3.

## What changes were proposed in this pull request? Address apache#20449 (comment), If `resultIter` is already a `InterruptibleIterator`, don't double wrap it. ## How was this patch tested? Existing tests. Author: Xingbo Jiang <[email protected]> Closes apache#20920 from jiangxb1987/SPARK-23040.

[SPARK-23040][CORE]: Returns interruptible iterator for shuffle reader

acca0e3

Before this commit, a non-interruptible iterator is returned if aggregator or ordering is specified.

cloud-fan reviewed Jan 31, 2018

View reviewed changes

jerryshao reviewed Feb 8, 2018

View reviewed changes

cloud-fan reviewed Feb 9, 2018

View reviewed changes

Update comments and tests.

ddeffd8

advancedxy force-pushed the SPARK-23040 branch from 14c9dc1 to ddeffd8 Compare February 25, 2018 15:49

advancedxy commented Feb 25, 2018

View reviewed changes

cloud-fan reviewed Feb 26, 2018

View reviewed changes

Fix style issues and update comments

88e86e0

cloud-fan reviewed Feb 27, 2018

View reviewed changes

Remove unnecessary Semaphore and Future block

ba2f355

cloud-fan reviewed Feb 27, 2018

View reviewed changes

Add more comments for the invocation of f.cancel()

8c15c56

cloud-fan reviewed Feb 28, 2018

View reviewed changes

advancedxy added 3 commits March 1, 2018 11:04

Hardcode numSlice(num of partition) to 1

756e0b7

Reset global semaphores as they may interfere each other in multiple

2061d0a

tests Move addSparkListener to beginning of test

Use lowercase for test name.

a3d8ad5

cloud-fan reviewed Mar 1, 2018

View reviewed changes

Add comment for global variables

28119e9

asfgit closed this in f2cab56 Mar 5, 2018

cloud-fan reviewed Mar 6, 2018

View reviewed changes

jiangxb1987 mentioned this pull request Mar 28, 2018

[SPARK-23040][CORE][FOLLOW-UP] Avoid double wrap result Iterator. #20920

Closed

jiangxb1987 mentioned this pull request Mar 31, 2018

[BACKPORT][SPARK-23040][CORE] Returns interruptible iterator for shuffle reader #20954

Closed

		import org.scalatest.BeforeAndAfter
		import org.scalatest.Matchers

[SPARK-23040][CORE]: Returns interruptible iterator for shuffle reader #20449

[SPARK-23040][CORE]: Returns interruptible iterator for shuffle reader #20449

Conversation

advancedxy commented Jan 31, 2018

What changes were proposed in this pull request?

How was this patch tested?

advancedxy commented Jan 31, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan commented Jan 31, 2018

advancedxy commented Feb 5, 2018

jerryshao commented Feb 8, 2018

advancedxy commented Feb 8, 2018

jerryshao commented Feb 8, 2018

advancedxy commented Feb 8, 2018 • edited Loading

jerryshao commented Feb 8, 2018

jerryshao Feb 8, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

advancedxy commented Feb 9, 2018

Choose a reason for hiding this comment

advancedxy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

advancedxy commented Feb 26, 2018

cloud-fan commented Feb 27, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Feb 27, 2018

SparkQA commented Feb 27, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan Feb 27, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Feb 27, 2018

SparkQA commented Feb 27, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan commented Feb 28, 2018

cloud-fan commented Feb 28, 2018

SparkQA commented Feb 28, 2018

advancedxy commented Mar 1, 2018

SparkQA commented Mar 1, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Mar 1, 2018

SparkQA commented Mar 1, 2018

advancedxy commented Mar 4, 2018

cloud-fan commented Mar 5, 2018

advancedxy commented Mar 6, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

advancedxy commented Feb 8, 2018 •

edited

Loading

jerryshao Feb 8, 2018 •

edited

Loading

cloud-fan Feb 27, 2018 •

edited

Loading