Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove GraphX MessageToPartition for compatibility with sort-based shuffle #1537

Closed
wants to merge 2 commits into from

Conversation

ankurdave
Copy link
Contributor

MessageToPartition was used in Graph#partitionBy. Unlike a Tuple2, it marked the key as transient to avoid sending it over the network. However, it was incompatible with sort-based shuffle (SPARK-2045) and represented only a minor optimization: for partitionBy, it improved performance by 6.3% (30.4 s to 28.5 s) and reduced communication by 5.6% (114.2 MB to 107.8 MB).

It was used in Graph#partitionBy. Unlike a Tuple2, it marked the key as
transient to avoid sending it over the network. However, this is
incompatible with sort-based shuffle (SPARK-2045) and is only a minor
optimization.
@ankurdave
Copy link
Contributor Author

@rxin @mateiz

@SparkQA
Copy link

SparkQA commented Jul 23, 2014

QA tests have started for PR 1537. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17004/consoleFull

@mateiz
Copy link
Contributor

mateiz commented Jul 23, 2014

Looks good to me assuming tests work.

@mateiz
Copy link
Contributor

mateiz commented Jul 23, 2014

Thanks for putting this together!

@SparkQA
Copy link

SparkQA commented Jul 23, 2014

QA results for PR 1537:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17004/consoleFull

@rxin
Copy link
Contributor

rxin commented Jul 23, 2014

I've merged this. Thanks!

@asfgit asfgit closed this in 6c2be93 Jul 23, 2014
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
…uffle

MessageToPartition was used in `Graph#partitionBy`. Unlike a Tuple2, it marked the key as transient to avoid sending it over the network. However, it was incompatible with sort-based shuffle (SPARK-2045) and represented only a minor optimization: for partitionBy, it improved performance by 6.3% (30.4 s to 28.5 s) and reduced communication by 5.6% (114.2 MB to 107.8 MB).

Author: Ankur Dave <[email protected]>

Closes apache#1537 from ankurdave/remove-MessageToPartition and squashes the following commits:

f9d0054 [Ankur Dave] Remove MessageToPartition
ab71364 [Ankur Dave] Remove unused VertexBroadcastMsg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants