Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-7237] Many user provided closures are not actually cleaned #5787

Closed
wants to merge 9 commits into from

Conversation

andrewor14
Copy link
Contributor

Note: ~140 lines are tests.

In a nutshell, we never cleaned closures the user provided through the following operations:

  • sortBy
  • keyBy
  • mapPartitions
  • mapPartitionsWithIndex
  • aggregateByKey
  • foldByKey
  • foreachAsync
  • one of the aliases for runJob
  • runApproximateJob

For more details on a reproduction and why they were not cleaned, please see SPARK-7237.

Andrew Or added 3 commits April 29, 2015 12:48
Tests should fail as of this commit because the issue hasn't been
fixed yet.
Now the test added in the previous commit passes!
@andrewor14
Copy link
Contributor Author

@zsxwing @pwendell Please have a look.

@SparkQA
Copy link

SparkQA commented Apr 29, 2015

Test build #31318 has finished for PR 5787 at commit e83699e.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@andrewor14
Copy link
Contributor Author

Um, retest this please

@SparkQA
Copy link

SparkQA commented Apr 29, 2015

Test build #31314 has finished for PR 5787 at commit 8ac3074.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

def testFilter(rdd: RDD[Int]): Unit = { rdd.filter { _ => return; true }.count() }
def testSortBy(rdd: RDD[Int]): Unit = { rdd.sortBy { _ => return; 1 }.count() }
def testKeyBy(rdd: RDD[Int]): Unit = { rdd.keyBy { _ => return; 1 }.count() }
def testGroupBy(rdd: RDD[Int]): Unit = { rdd.groupBy { _ => return; 1 }.count() }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No test for testGroupBy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops good catch. I just verified that all other test methods are called.

@SparkQA
Copy link

SparkQA commented Apr 29, 2015

Test build #31328 has finished for PR 5787 at commit e83699e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@zsxwing
Copy link
Member

zsxwing commented Apr 30, 2015

Sorry. Just deleted my wrong comments.

mapPartitionsWithContext needs to be updated too.

@@ -119,7 +119,8 @@ class AsyncRDDActions[T: ClassTag](self: RDD[T]) extends Serializable with Loggi
* Applies a function f to each partition of this RDD.
*/
def foreachPartitionAsync(f: Iterator[T] => Unit): FutureAction[Unit] = {
self.context.submitJob[T, Unit, Unit](self, f, Range(0, self.partitions.length),
val cleanedF = self.context.clean(f)
self.context.submitJob[T, Unit, Unit](self, cleanedF, Range(0, self.partitions.length),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to clean here. submitJob will clean f.

@SparkQA
Copy link

SparkQA commented Apr 30, 2015

Test build #31359 has finished for PR 5787 at commit 6498f44.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@andrewor14
Copy link
Contributor Author

@zsxwing I didn't update that one (and others like foreachWith) because it's deprecated. I think it's fine to leave those out.

@SparkQA
Copy link

SparkQA commented May 2, 2015

Test build #31640 has finished for PR 5787 at commit df3caa3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Conflicts:
	core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala
@SparkQA
Copy link

SparkQA commented May 3, 2015

Test build #31675 has finished for PR 5787 at commit 7265865.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class Dialect
    • class DialectException(msg: String, cause: Throwable) extends Exception(msg, cause)

@andrewor14
Copy link
Contributor Author

Anything else? Can I merge this?

@andrewor14
Copy link
Contributor Author

Alright, I'm merging this into master and 1.4, thanks everyone.

Conflicts:
	core/src/main/scala/org/apache/spark/rdd/RDD.scala
@asfgit asfgit closed this in 1fdabf8 May 5, 2015
asfgit pushed a commit that referenced this pull request May 5, 2015
Note: ~140 lines are tests.

In a nutshell, we never cleaned closures the user provided through the following operations:
- sortBy
- keyBy
- mapPartitions
- mapPartitionsWithIndex
- aggregateByKey
- foldByKey
- foreachAsync
- one of the aliases for runJob
- runApproximateJob

For more details on a reproduction and why they were not cleaned, please see [SPARK-7237](https://issues.apache.org/jira/browse/SPARK-7237).

Author: Andrew Or <[email protected]>

Closes #5787 from andrewor14/clean-more and squashes the following commits:

2f1f476 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more
7265865 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more
df3caa3 [Andrew Or] Address comments
7a3cc80 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more
6498f44 [Andrew Or] Add missing test for groupBy
e83699e [Andrew Or] Clean one more
8ac3074 [Andrew Or] Prevent NPE in tests when CC is used outside of an app
9ac5f9b [Andrew Or] Clean closures that are not currently cleaned
19e33b4 [Andrew Or] Add tests for all public RDD APIs that take in closures

(cherry picked from commit 1fdabf8)
Signed-off-by: Andrew Or <[email protected]>
@andrewor14 andrewor14 deleted the clean-more branch May 5, 2015 16:41
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31886/
Test FAILed.

jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
Note: ~140 lines are tests.

In a nutshell, we never cleaned closures the user provided through the following operations:
- sortBy
- keyBy
- mapPartitions
- mapPartitionsWithIndex
- aggregateByKey
- foldByKey
- foreachAsync
- one of the aliases for runJob
- runApproximateJob

For more details on a reproduction and why they were not cleaned, please see [SPARK-7237](https://issues.apache.org/jira/browse/SPARK-7237).

Author: Andrew Or <[email protected]>

Closes apache#5787 from andrewor14/clean-more and squashes the following commits:

2f1f476 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more
7265865 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more
df3caa3 [Andrew Or] Address comments
7a3cc80 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more
6498f44 [Andrew Or] Add missing test for groupBy
e83699e [Andrew Or] Clean one more
8ac3074 [Andrew Or] Prevent NPE in tests when CC is used outside of an app
9ac5f9b [Andrew Or] Clean closures that are not currently cleaned
19e33b4 [Andrew Or] Add tests for all public RDD APIs that take in closures
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
Note: ~140 lines are tests.

In a nutshell, we never cleaned closures the user provided through the following operations:
- sortBy
- keyBy
- mapPartitions
- mapPartitionsWithIndex
- aggregateByKey
- foldByKey
- foreachAsync
- one of the aliases for runJob
- runApproximateJob

For more details on a reproduction and why they were not cleaned, please see [SPARK-7237](https://issues.apache.org/jira/browse/SPARK-7237).

Author: Andrew Or <[email protected]>

Closes apache#5787 from andrewor14/clean-more and squashes the following commits:

2f1f476 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more
7265865 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more
df3caa3 [Andrew Or] Address comments
7a3cc80 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more
6498f44 [Andrew Or] Add missing test for groupBy
e83699e [Andrew Or] Clean one more
8ac3074 [Andrew Or] Prevent NPE in tests when CC is used outside of an app
9ac5f9b [Andrew Or] Clean closures that are not currently cleaned
19e33b4 [Andrew Or] Add tests for all public RDD APIs that take in closures
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
Note: ~140 lines are tests.

In a nutshell, we never cleaned closures the user provided through the following operations:
- sortBy
- keyBy
- mapPartitions
- mapPartitionsWithIndex
- aggregateByKey
- foldByKey
- foreachAsync
- one of the aliases for runJob
- runApproximateJob

For more details on a reproduction and why they were not cleaned, please see [SPARK-7237](https://issues.apache.org/jira/browse/SPARK-7237).

Author: Andrew Or <[email protected]>

Closes apache#5787 from andrewor14/clean-more and squashes the following commits:

2f1f476 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more
7265865 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more
df3caa3 [Andrew Or] Address comments
7a3cc80 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more
6498f44 [Andrew Or] Add missing test for groupBy
e83699e [Andrew Or] Clean one more
8ac3074 [Andrew Or] Prevent NPE in tests when CC is used outside of an app
9ac5f9b [Andrew Or] Clean closures that are not currently cleaned
19e33b4 [Andrew Or] Add tests for all public RDD APIs that take in closures
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants