SPARK-3290 [GRAPHX] No unpersist callls in SVDPlusPlus #4234

srowen · 2015-01-27T23:15:59Z

This just unpersist()s each RDD in this code that was cache()ed.

SparkQA · 2015-01-27T23:17:48Z

Test build #26194 has started for PR 4234 at commit c0311bb.

This patch merges cleanly.

ankurdave · 2015-01-27T23:40:56Z

I don't think this will always have the desired effect. In the cases where you unpersist upstream RDDs before materializing their results, it's equivalent to never caching anything at all. Rather than

val start = System.currentTimeMillis
val a = sc.parallelize(Array(1)).map(x => { Thread.sleep(5000); x}).cache()
a.count() // Use a once
val b = a.map(identity).cache() // Use a again
// Problem: b has not been computed yet, so a hasn't been used
a.unpersist()
b.count() // Now compute b using a
System.currentTimeMillis - start // => 10 seconds

the correct pattern is to materialize a result (here b) before unpersisting its upstream RDDs:

val start = System.currentTimeMillis
val a = sc.parallelize(Array(1)).map(x => { Thread.sleep(5000); x}).cache()
a.count() // Use a once
val b = a.map(identity).cache()
b.count() // materialize b (cache and compute it), forcing a to be used again
a.unpersist()
System.currentTimeMillis - start // => 5 seconds

SparkQA · 2015-01-28T00:26:16Z

Test build #26194 has finished for PR 4234 at commit c0311bb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-01-28T00:26:20Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26194/
Test PASSed.

srowen · 2015-01-28T10:10:58Z

@ankurdave Ah of course. Hm, the thing is there is a loop of transformations here, so in theory you'd have to save up all the RDD references and unpersist at the very end -- but even that may be insufficient as the return value isn't even materialized before it's returned. Also, operations like outerJoinVertices seem to cache() vertices internally too. It's a bit like the old days of memory management!

What's the right thing here? don't cache() in this method at all since lower layers do some of that management? force materialization?

ankurdave · 2015-01-29T01:37:44Z

@srowen Caching here is necessary since we do reuse some uncached datasets, so I don't think there's a good solution for uncaching in this kind of situation.

Ideally Spark would handle this by refcounting RDDs, but the best we can do for now is probably to force materialization as PageRank and Pregel do.

srowen · 2015-01-29T01:40:21Z

OK, like add some count() calls to make it persist the RDDs? or just treat this as wont-fix?

ankurdave · 2015-01-29T07:04:48Z

Adding count() calls would be good.

srowen · 2015-01-29T10:43:07Z

@ankurdave Hm, so I looked again and quickly realized the things being cached are Graphs, and they consist of several RDDs. It's possible to add several count() methods for vertices and edges. There could be a generic materialize() method which internally counts. Both add an API method to an abstract class here. That works although might be viewed as overkill? or just what has to be done? wanted to check before I take a shot at it. This may be outside my expertise as I haven't worked with graphx.

ankurdave · 2015-01-29T19:04:54Z

@srowen The standard way to materialize a Graph is just to do graph.vertices.count(); graph.edges.count(). Due to the dependency structure this in fact materializes everything inside the Graph. A materialize() method that does this might be useful, but it does seem a bit like overkill.

SparkQA · 2015-01-30T22:32:55Z

Test build #26432 has started for PR 4234 at commit 350bb2b.

This patch merges cleanly.

SparkQA · 2015-01-30T23:45:38Z

Test build #26432 has finished for PR 4234 at commit 350bb2b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-01-30T23:45:41Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26432/
Test PASSed.

srowen · 2015-02-08T13:57:26Z

@ankurdave What do you think of the result here?

SparkQA · 2015-02-12T12:57:59Z

Test build #27357 has started for PR 4234 at commit 66c1e11.

This patch merges cleanly.

SparkQA · 2015-02-12T13:54:50Z

Test build #27357 has finished for PR 4234 at commit 66c1e11.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-12T13:54:54Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27357/
Test FAILed.

srowen · 2015-02-12T14:03:26Z

Jenkins, retest this please.

SparkQA · 2015-02-12T14:07:28Z

Test build #27358 has started for PR 4234 at commit 66c1e11.

This patch merges cleanly.

SparkQA · 2015-02-12T15:23:41Z

Test build #27358 has finished for PR 4234 at commit 66c1e11.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-12T15:23:45Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27358/
Test PASSed.

srowen · 2015-02-12T21:52:12Z

@ankurdave unless you have more comments, I think it can be merged in a bit. The pattern here seems sound: with X cached, derive Y from X, cache Y, materialize Y, unpersist X

ankurdave · 2015-02-14T04:12:19Z

LGTM

This just unpersist()s each RDD in this code that was cache()ed. Author: Sean Owen <[email protected]> Closes #4234 from srowen/SPARK-3290 and squashes the following commits: 66c1e11 [Sean Owen] unpersist() each RDD that was cache()ed (cherry picked from commit 0ce4e43) Signed-off-by: Ankur Dave <[email protected]>

ankurdave · 2015-02-14T04:16:05Z

Thanks! Merged into master & branch-1.3.

unpersist() each RDD that was cache()ed

66c1e11

srowen force-pushed the SPARK-3290 branch from 350bb2b to 66c1e11 Compare February 12, 2015 12:52

asfgit closed this in 0ce4e43 Feb 14, 2015

srowen deleted the SPARK-3290 branch February 14, 2015 21:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARK-3290 [GRAPHX] No unpersist callls in SVDPlusPlus #4234

SPARK-3290 [GRAPHX] No unpersist callls in SVDPlusPlus #4234

srowen commented Jan 27, 2015

SparkQA commented Jan 27, 2015

ankurdave commented Jan 27, 2015

SparkQA commented Jan 28, 2015

AmplabJenkins commented Jan 28, 2015

srowen commented Jan 28, 2015

ankurdave commented Jan 29, 2015

srowen commented Jan 29, 2015

ankurdave commented Jan 29, 2015

srowen commented Jan 29, 2015

ankurdave commented Jan 29, 2015

SparkQA commented Jan 30, 2015

SparkQA commented Jan 30, 2015

AmplabJenkins commented Jan 30, 2015

srowen commented Feb 8, 2015

SparkQA commented Feb 12, 2015

SparkQA commented Feb 12, 2015

AmplabJenkins commented Feb 12, 2015

srowen commented Feb 12, 2015

SparkQA commented Feb 12, 2015

SparkQA commented Feb 12, 2015

AmplabJenkins commented Feb 12, 2015

srowen commented Feb 12, 2015

ankurdave commented Feb 14, 2015

ankurdave commented Feb 14, 2015

SPARK-3290 [GRAPHX] No unpersist callls in SVDPlusPlus #4234

SPARK-3290 [GRAPHX] No unpersist callls in SVDPlusPlus #4234

Conversation

srowen commented Jan 27, 2015

SparkQA commented Jan 27, 2015

ankurdave commented Jan 27, 2015

SparkQA commented Jan 28, 2015

AmplabJenkins commented Jan 28, 2015

srowen commented Jan 28, 2015

ankurdave commented Jan 29, 2015

srowen commented Jan 29, 2015

ankurdave commented Jan 29, 2015

srowen commented Jan 29, 2015

ankurdave commented Jan 29, 2015

SparkQA commented Jan 30, 2015

SparkQA commented Jan 30, 2015

AmplabJenkins commented Jan 30, 2015

srowen commented Feb 8, 2015

SparkQA commented Feb 12, 2015

SparkQA commented Feb 12, 2015

AmplabJenkins commented Feb 12, 2015

srowen commented Feb 12, 2015

SparkQA commented Feb 12, 2015

SparkQA commented Feb 12, 2015

AmplabJenkins commented Feb 12, 2015

srowen commented Feb 12, 2015

ankurdave commented Feb 14, 2015

ankurdave commented Feb 14, 2015