Skip to content

Commit

Permalink
Explain why broadcasting serialized copy of the task.
Browse files Browse the repository at this point in the history
  • Loading branch information
rxin committed Jul 18, 2014
1 parent 04b17f0 commit 754085f
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions core/src/main/scala/org/apache/spark/rdd/RDD.scala
Original file line number Diff line number Diff line change
Expand Up @@ -1207,8 +1207,10 @@ abstract class RDD[T: ClassTag](
// =======================================================================

/**
* Broadcasted copy of this RDD, used to dispatch tasks to executors. Note that this is
* a lazy val so the broadcast is created only when tasks are scheduled on this RDD.
* Broadcasted copy of this RDD, used to dispatch tasks to executors. Note that we broadcast
* the serialized copy of the RDD and for each task we will deserialize it, which means each
* task gets a different copy of the RDD. This provides stronger isolation between tasks that
* might modify state of objects referenced in their closures.
*/
@transient private[spark] lazy val broadcasted = {
val ser = SparkEnv.get.closureSerializer.newInstance()
Expand Down

0 comments on commit 754085f

Please sign in to comment.