-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-7237] Many user provided closures are not actually cleaned #5787
Conversation
Tests should fail as of this commit because the issue hasn't been fixed yet.
Now the test added in the previous commit passes!
Test build #31318 has finished for PR 5787 at commit
|
Um, retest this please |
Test build #31314 has finished for PR 5787 at commit
|
def testFilter(rdd: RDD[Int]): Unit = { rdd.filter { _ => return; true }.count() } | ||
def testSortBy(rdd: RDD[Int]): Unit = { rdd.sortBy { _ => return; 1 }.count() } | ||
def testKeyBy(rdd: RDD[Int]): Unit = { rdd.keyBy { _ => return; 1 }.count() } | ||
def testGroupBy(rdd: RDD[Int]): Unit = { rdd.groupBy { _ => return; 1 }.count() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No test for testGroupBy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops good catch. I just verified that all other test methods are called.
Test build #31328 has finished for PR 5787 at commit
|
Sorry. Just deleted my wrong comments.
|
@@ -119,7 +119,8 @@ class AsyncRDDActions[T: ClassTag](self: RDD[T]) extends Serializable with Loggi | |||
* Applies a function f to each partition of this RDD. | |||
*/ | |||
def foreachPartitionAsync(f: Iterator[T] => Unit): FutureAction[Unit] = { | |||
self.context.submitJob[T, Unit, Unit](self, f, Range(0, self.partitions.length), | |||
val cleanedF = self.context.clean(f) | |||
self.context.submitJob[T, Unit, Unit](self, cleanedF, Range(0, self.partitions.length), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to clean here. submitJob
will clean f
.
Test build #31359 has finished for PR 5787 at commit
|
@zsxwing I didn't update that one (and others like |
Test build #31640 has finished for PR 5787 at commit
|
Conflicts: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala
Test build #31675 has finished for PR 5787 at commit
|
Anything else? Can I merge this? |
Alright, I'm merging this into master and 1.4, thanks everyone. |
Conflicts: core/src/main/scala/org/apache/spark/rdd/RDD.scala
Note: ~140 lines are tests. In a nutshell, we never cleaned closures the user provided through the following operations: - sortBy - keyBy - mapPartitions - mapPartitionsWithIndex - aggregateByKey - foldByKey - foreachAsync - one of the aliases for runJob - runApproximateJob For more details on a reproduction and why they were not cleaned, please see [SPARK-7237](https://issues.apache.org/jira/browse/SPARK-7237). Author: Andrew Or <[email protected]> Closes #5787 from andrewor14/clean-more and squashes the following commits: 2f1f476 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more 7265865 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more df3caa3 [Andrew Or] Address comments 7a3cc80 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more 6498f44 [Andrew Or] Add missing test for groupBy e83699e [Andrew Or] Clean one more 8ac3074 [Andrew Or] Prevent NPE in tests when CC is used outside of an app 9ac5f9b [Andrew Or] Clean closures that are not currently cleaned 19e33b4 [Andrew Or] Add tests for all public RDD APIs that take in closures (cherry picked from commit 1fdabf8) Signed-off-by: Andrew Or <[email protected]>
Test FAILed. |
Note: ~140 lines are tests. In a nutshell, we never cleaned closures the user provided through the following operations: - sortBy - keyBy - mapPartitions - mapPartitionsWithIndex - aggregateByKey - foldByKey - foreachAsync - one of the aliases for runJob - runApproximateJob For more details on a reproduction and why they were not cleaned, please see [SPARK-7237](https://issues.apache.org/jira/browse/SPARK-7237). Author: Andrew Or <[email protected]> Closes apache#5787 from andrewor14/clean-more and squashes the following commits: 2f1f476 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more 7265865 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more df3caa3 [Andrew Or] Address comments 7a3cc80 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more 6498f44 [Andrew Or] Add missing test for groupBy e83699e [Andrew Or] Clean one more 8ac3074 [Andrew Or] Prevent NPE in tests when CC is used outside of an app 9ac5f9b [Andrew Or] Clean closures that are not currently cleaned 19e33b4 [Andrew Or] Add tests for all public RDD APIs that take in closures
Note: ~140 lines are tests. In a nutshell, we never cleaned closures the user provided through the following operations: - sortBy - keyBy - mapPartitions - mapPartitionsWithIndex - aggregateByKey - foldByKey - foreachAsync - one of the aliases for runJob - runApproximateJob For more details on a reproduction and why they were not cleaned, please see [SPARK-7237](https://issues.apache.org/jira/browse/SPARK-7237). Author: Andrew Or <[email protected]> Closes apache#5787 from andrewor14/clean-more and squashes the following commits: 2f1f476 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more 7265865 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more df3caa3 [Andrew Or] Address comments 7a3cc80 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more 6498f44 [Andrew Or] Add missing test for groupBy e83699e [Andrew Or] Clean one more 8ac3074 [Andrew Or] Prevent NPE in tests when CC is used outside of an app 9ac5f9b [Andrew Or] Clean closures that are not currently cleaned 19e33b4 [Andrew Or] Add tests for all public RDD APIs that take in closures
Note: ~140 lines are tests. In a nutshell, we never cleaned closures the user provided through the following operations: - sortBy - keyBy - mapPartitions - mapPartitionsWithIndex - aggregateByKey - foldByKey - foreachAsync - one of the aliases for runJob - runApproximateJob For more details on a reproduction and why they were not cleaned, please see [SPARK-7237](https://issues.apache.org/jira/browse/SPARK-7237). Author: Andrew Or <[email protected]> Closes apache#5787 from andrewor14/clean-more and squashes the following commits: 2f1f476 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more 7265865 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more df3caa3 [Andrew Or] Address comments 7a3cc80 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more 6498f44 [Andrew Or] Add missing test for groupBy e83699e [Andrew Or] Clean one more 8ac3074 [Andrew Or] Prevent NPE in tests when CC is used outside of an app 9ac5f9b [Andrew Or] Clean closures that are not currently cleaned 19e33b4 [Andrew Or] Add tests for all public RDD APIs that take in closures
Note: ~140 lines are tests.
In a nutshell, we never cleaned closures the user provided through the following operations:
For more details on a reproduction and why they were not cleaned, please see SPARK-7237.