[SPARK-11932][STREAMING] Partition previous TrackStateRDD if partitioner not present #9988

tdas · 2015-11-26T02:10:28Z

The reason is that TrackStateRDDs generated by trackStateByKey expect the previous batch's TrackStateRDDs to have a partitioner. However, when recovery from DStream checkpoints, the RDDs recovered from RDD checkpoints do not have a partitioner attached to it. This is because RDD checkpoints do not preserve the partitioner (SPARK-12004).

While #9983 solves SPARK-12004 by preserving the partitioner through RDD checkpoints, there may be a non-zero chance that the saving and recovery fails. To be resilient, this PR repartitions the previous state RDD if the partitioner is not detected.

tdas · 2015-11-26T02:11:10Z

streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala

+ * DStream checkpointing. Note that the implementations of this trait has to implement
+ * the `setupCheckpointOperation`
+ */
+trait DStreamCheckpointTester { self: SparkFunSuite =>


This is refactoring where I extract out the testCheckpointedOperation so that it can be used in other unit tests.

tdas · 2015-11-26T02:11:19Z

@zsxwing Please take a look.

SparkQA · 2015-11-26T02:50:19Z

Test build #46730 has finished for PR 9988 at commit 0c5fe55.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * class SparkPlanInfo(\n * class SQLMetricInfo(\n * case class SparkListenerSQLExecutionStart(\n * case class SparkListenerSQLExecutionEnd(executionId: Long, time: Long)\n

zsxwing · 2015-11-26T06:43:21Z

streaming/src/main/scala/org/apache/spark/streaming/dstream/TrackStateDStream.scala

+      case None =>
+        TrackStateRDD.createFromPairRDD[K, V, S, E](
+          spec.getInitialStateRDD().getOrElse(new EmptyRDD[(K, S)](ssc.sparkContext)),
+          partitioner, validTime


nit: validTime should be in a new line.

SparkQA · 2015-12-01T03:26:09Z

Test build #2133 has finished for PR 9988 at commit 0c5fe55.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-12-01T03:28:30Z

Test build #46930 has finished for PR 9988 at commit a1d3f75.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

zsxwing · 2015-12-01T18:29:59Z

retest this please

zsxwing · 2015-12-01T18:57:08Z

streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala

@@ -56,7 +172,7 @@ class CheckpointSuite extends TestSuiteBase {

  override def afterFunction() {
    super.afterFunction()
-    if (ssc != null) ssc.stop()
+    StreamingContext.getActive().foreach { _.stop() }


If SparkContext is created in the StreamingContext's constructor, but StreamingContext's.stop is not called, the line cannot stop the SparkContext.

yhuai · 2015-12-01T21:23:24Z

test this please

SparkQA · 2015-12-01T22:22:21Z

Test build #46992 has finished for PR 9988 at commit a1d3f75.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-12-01T23:20:22Z

Test build #47005 has finished for PR 9988 at commit f4fb557.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

tdas · 2015-12-02T03:10:21Z

streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala

@@ -730,7 +730,8 @@ class PairDStreamFunctions[K, V](self: DStream[(K, V)])
    val serializableConf = new SerializableJobConf(conf)
    val saveFunc = (rdd: RDD[(K, V)], time: Time) => {
      val file = rddToFileName(prefix, suffix, time)
-      rdd.saveAsHadoopFile(file, keyClass, valueClass, outputFormatClass, serializableConf.value)
+      rdd.saveAsHadoopFile(file, keyClass, valueClass, outputFormatClass,


This is the same change done in #10088

SparkQA · 2015-12-02T03:59:18Z

Test build #47033 has finished for PR 9988 at commit 42b35b7.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-12-02T04:08:26Z

Test build #2148 has finished for PR 9988 at commit 42b35b7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…HadoopFiles The JobConf object created in `DStream.saveAsHadoopFiles` is used concurrently in multiple places: * The JobConf is updated by `RDD.saveAsHadoopFile()` before the job is launched * The JobConf is serialized as part of the DStream checkpoints. These concurrent accesses (updating in one thread, while the another thread is serializing it) can lead to concurrentModidicationException in the underlying Java hashmap using in the internal Hadoop Configuration object. The solution is to create a new JobConf in every batch, that is updated by `RDD.saveAsHadoopFile()`, while the checkpointing serializes the original JobConf. Tests to be added in #9988 will fail reliably without this patch. Keeping this patch really small to make sure that it can be added to previous branches. Author: Tathagata Das <[email protected]> Closes #10088 from tdas/SPARK-12087. (cherry picked from commit 8a75a30) Signed-off-by: Shixiong Zhu <[email protected]>

…HadoopFiles The JobConf object created in `DStream.saveAsHadoopFiles` is used concurrently in multiple places: * The JobConf is updated by `RDD.saveAsHadoopFile()` before the job is launched * The JobConf is serialized as part of the DStream checkpoints. These concurrent accesses (updating in one thread, while the another thread is serializing it) can lead to concurrentModidicationException in the underlying Java hashmap using in the internal Hadoop Configuration object. The solution is to create a new JobConf in every batch, that is updated by `RDD.saveAsHadoopFile()`, while the checkpointing serializes the original JobConf. Tests to be added in #9988 will fail reliably without this patch. Keeping this patch really small to make sure that it can be added to previous branches. Author: Tathagata Das <[email protected]> Closes #10088 from tdas/SPARK-12087.

zsxwing · 2015-12-02T05:09:41Z

retest this please

SparkQA · 2015-12-02T06:03:07Z

Test build #47037 has finished for PR 9988 at commit 42b35b7.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-12-02T22:06:15Z

Test build #47085 has finished for PR 9988 at commit 34a52f9.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-12-02T22:29:02Z

Test build #2153 has finished for PR 9988 at commit 34a52f9.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-12-02T22:30:37Z

Test build #2152 has finished for PR 9988 at commit 34a52f9.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-12-03T03:23:04Z

Test build #2155 has finished for PR 9988 at commit 96e95ab.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-12-03T03:35:47Z

Test build #2154 has finished for PR 9988 at commit 96e95ab.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-12-03T03:37:34Z

Test build #47109 has finished for PR 9988 at commit 96e95ab.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-12-03T10:19:53Z

Test build #2159 has finished for PR 9988 at commit 53846f5.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-12-03T10:24:26Z

Test build #47133 has finished for PR 9988 at commit 53846f5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-12-03T10:27:18Z

Test build #2158 has finished for PR 9988 at commit 53846f5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-12-03T10:28:18Z

Test build #2160 has finished for PR 9988 at commit 53846f5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tdas · 2015-12-04T09:53:50Z

streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala

@@ -277,7 +277,7 @@ class CheckpointWriter(
      val bytes = Checkpoint.serialize(checkpoint, conf)
      executor.execute(new CheckpointWriteHandler(
        checkpoint.checkpointTime, bytes, clearCheckpointDataLater))
-      logDebug("Submitted checkpoint of time " + checkpoint.checkpointTime + " writer queue")


note to self: revert this

tdas · 2015-12-04T09:56:07Z

@zsxwing can you take a look at this once again.

SparkQA · 2015-12-04T10:48:44Z

Test build #47194 has finished for PR 9988 at commit 3136f27.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * class SparkPlanInfo(\n * class SQLMetricInfo(\n * case class SparkListenerSQLExecutionStart(\n * case class SparkListenerSQLExecutionEnd(executionId: Long, time: Long)\n

SparkQA · 2015-12-04T10:50:09Z

Test build #2167 has finished for PR 9988 at commit 3136f27.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-12-04T10:54:57Z

Test build #2168 has finished for PR 9988 at commit 3136f27.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * class SparkPlanInfo(\n * class SQLMetricInfo(\n * case class SparkListenerSQLExecutionStart(\n * case class SparkListenerSQLExecutionEnd(executionId: Long, time: Long)\n

zsxwing · 2015-12-07T05:24:32Z

streaming/src/test/resources/log4j.properties

@@ -25,4 +25,5 @@ log4j.appender.file.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss.SSS} %t %p %c{

 # Ignore messages below warning level from Jetty, because it's a bit verbose
 log4j.logger.org.spark-project.jetty=WARN
+log4j.appender.org.apache.spark.streaming=DEBUG


nit: should revert this

zsxwing · 2015-12-07T05:42:24Z

LGTM except a nit

SparkQA · 2015-12-07T09:47:51Z

Test build #47262 has finished for PR 9988 at commit fd6b83e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tdas · 2015-12-07T19:03:44Z

Thanks @zsxwing Merging this to master and 1.6

…ner not present The reason is that TrackStateRDDs generated by trackStateByKey expect the previous batch's TrackStateRDDs to have a partitioner. However, when recovery from DStream checkpoints, the RDDs recovered from RDD checkpoints do not have a partitioner attached to it. This is because RDD checkpoints do not preserve the partitioner (SPARK-12004). While #9983 solves SPARK-12004 by preserving the partitioner through RDD checkpoints, there may be a non-zero chance that the saving and recovery fails. To be resilient, this PR repartitions the previous state RDD if the partitioner is not detected. Author: Tathagata Das <[email protected]> Closes #9988 from tdas/SPARK-11932. (cherry picked from commit 5d80d8c) Signed-off-by: Tathagata Das <[email protected]>

Partition previous state RDD if partitioner not present

0c5fe55

tdas reviewed Nov 26, 2015
View reviewed changes

tdas changed the title ~~[SPARK-11932][STREAMING] Partition previous state RDD if partitioner not present~~ [SPARK-11932][STREAMING] Partition previous TrackStateRDD if partitioner not present Nov 26, 2015

zsxwing reviewed Nov 26, 2015
View reviewed changes

Refactored to address PR comments

a1d3f75

zsxwing reviewed Dec 1, 2015
View reviewed changes

Revering a line

f4fb557

tdas mentioned this pull request Dec 2, 2015

[SPARK-12087][Streaming] Create new JobConf for every batch in saveAsHadoopFiles #10088

Closed

tdas added 2 commits December 1, 2015 19:08

Fixed errors

d3da04f

Revert unnecessary change

42b35b7

tdas reviewed Dec 2, 2015
View reviewed changes

Minor changes

34a52f9

More debugging

96e95ab

Possibly fixed flakiness

53846f5

Merge remote-tracking branch 'apache-github/master' into SPARK-11932

3136f27

tdas reviewed Dec 4, 2015
View reviewed changes

zsxwing reviewed Dec 7, 2015
View reviewed changes

revert log4j.prop

fd6b83e

asfgit closed this in 5d80d8c Dec 7, 2015

[SPARK-11932][STREAMING] Partition previous TrackStateRDD if partitioner not present #9988

[SPARK-11932][STREAMING] Partition previous TrackStateRDD if partitioner not present #9988

Conversation

tdas commented Nov 26, 2015

tdas Nov 26, 2015

Choose a reason for hiding this comment

tdas commented Nov 26, 2015

SparkQA commented Nov 26, 2015

zsxwing Nov 26, 2015

Choose a reason for hiding this comment

tdas Dec 1, 2015

Choose a reason for hiding this comment

SparkQA commented Dec 1, 2015

SparkQA commented Dec 1, 2015

zsxwing commented Dec 1, 2015

zsxwing Dec 1, 2015

Choose a reason for hiding this comment

yhuai commented Dec 1, 2015

SparkQA commented Dec 1, 2015

SparkQA commented Dec 1, 2015

tdas Dec 2, 2015

Choose a reason for hiding this comment

SparkQA commented Dec 2, 2015

SparkQA commented Dec 2, 2015

zsxwing commented Dec 2, 2015

SparkQA commented Dec 2, 2015

SparkQA commented Dec 2, 2015

SparkQA commented Dec 2, 2015

SparkQA commented Dec 2, 2015

SparkQA commented Dec 3, 2015

SparkQA commented Dec 3, 2015

SparkQA commented Dec 3, 2015

SparkQA commented Dec 3, 2015

SparkQA commented Dec 3, 2015

SparkQA commented Dec 3, 2015

SparkQA commented Dec 3, 2015

tdas Dec 4, 2015

Choose a reason for hiding this comment

tdas commented Dec 4, 2015

SparkQA commented Dec 4, 2015

SparkQA commented Dec 4, 2015

SparkQA commented Dec 4, 2015

zsxwing Dec 7, 2015

Choose a reason for hiding this comment

zsxwing commented Dec 7, 2015

SparkQA commented Dec 7, 2015

tdas commented Dec 7, 2015