SPARK-2787: Make sort-based shuffle write files directly when there's no sorting/aggregation and # partitions is small #1799

mateiz · 2014-08-06T03:28:28Z

As described in https://issues.apache.org/jira/browse/SPARK-2787, right now sort-based shuffle is more expensive than hash-based for map operations that do no partial aggregation or sorting, such as groupByKey. This is because it has to serialize each data item twice (once when spilling to intermediate files, and then again when merging these files object-by-object). This patch adds a code path to just write separate files directly if the # of output partitions is small, and concatenate them at the end to produce a sorted file.

On the unit test side, I added some tests that force or don't force this bypass path to be used, and checked that our tests for other features (e.g. all the operations) cover both cases.

SparkQA · 2014-08-06T03:34:48Z

QA tests have started for PR 1799. This patch DID NOT merge cleanly!
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17985/consoleFull

SparkQA · 2014-08-06T03:55:22Z

QA results for PR 1799:
- This patch FAILED unit tests.

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17985/consoleFull

SparkQA · 2014-08-06T05:34:35Z

QA tests have started for PR 1799. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17999/consoleFull

mateiz · 2014-08-06T05:39:46Z

test this please

SparkQA · 2014-08-06T05:44:29Z

QA tests have started for PR 1799. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18000/consoleFull

SparkQA · 2014-08-06T06:41:06Z

QA results for PR 1799:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18000/consoleFull

mateiz · 2014-08-06T06:56:15Z

test this please

SparkQA · 2014-08-06T06:59:30Z

QA tests have started for PR 1799. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18009/consoleFull

SparkQA · 2014-08-06T07:50:03Z

QA results for PR 1799:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18009/consoleFull

mateiz · 2014-08-06T18:33:03Z

test this please

mateiz · 2014-08-06T18:34:55Z

@rxin / @andrewor14 would be good if you review this when you have a chance. This is something we should add in 1.1 since sort-based shuffle is still off by default.

mateiz · 2014-08-06T18:35:14Z

BTW the test failures both time were in a Flume test for streaming, which might just be flaky.

SparkQA · 2014-08-06T18:39:26Z

QA tests have started for PR 1799. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18036/consoleFull

andrewor14 · 2014-08-06T18:44:57Z

Yeah the flaky tests are fixed here #1803

mateiz · 2014-08-06T19:02:15Z

Ah cool, glad it's being fixed.

SparkQA · 2014-08-06T19:30:45Z

QA results for PR 1799:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18036/consoleFull

rxin · 2014-08-06T19:35:36Z

core/src/main/scala/org/apache/spark/SparkEnv.scala

+    val shortShuffleMgrNames = Map(
+      "HASH" -> "org.apache.spark.shuffle.hash.HashShuffleManager",
+      "SORT" -> "org.apache.spark.shuffle.sort.SortShuffleManager")
+    val shuffleMgrName = conf.get("spark.shuffle.manager", "HASH")


can we make this case insensitive?

Also renamed ExternalSorter.write(Iterator) to insertAll, to match ExternalAppendOnlyMap

mateiz · 2014-08-07T02:54:19Z

Thanks; updated to deal with comments.

SparkQA · 2014-08-07T02:59:41Z

QA tests have started for PR 1799. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18096/consoleFull

SparkQA · 2014-08-07T03:50:56Z

QA results for PR 1799:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18096/consoleFull

mateiz · 2014-08-08T01:02:11Z

@rxin does this look okay?

rxin · 2014-08-08T01:04:17Z

LGTM

rxin · 2014-08-08T01:04:36Z

I'm merging this in master & branch-1.1 (since sort-based is disabled by default)

mateiz · 2014-08-08T01:05:39Z

Alright, thanks. Going to merge it.

… no sorting/aggregation and # partitions is small As described in https://issues.apache.org/jira/browse/SPARK-2787, right now sort-based shuffle is more expensive than hash-based for map operations that do no partial aggregation or sorting, such as groupByKey. This is because it has to serialize each data item twice (once when spilling to intermediate files, and then again when merging these files object-by-object). This patch adds a code path to just write separate files directly if the # of output partitions is small, and concatenate them at the end to produce a sorted file. On the unit test side, I added some tests that force or don't force this bypass path to be used, and checked that our tests for other features (e.g. all the operations) cover both cases. Author: Matei Zaharia <[email protected]> Closes #1799 from mateiz/SPARK-2787 and squashes the following commits: 88cf26a [Matei Zaharia] Fix rebase 10233af [Matei Zaharia] Review comments 398cb95 [Matei Zaharia] Fix looking up shuffle manager in conf ca3efd9 [Matei Zaharia] Add docs for shuffle manager properties, and allow short names for them d0ae3c5 [Matei Zaharia] Fix some comments 90d084f [Matei Zaharia] Add code path to bypass merge-sort in ExternalSorter, and tests 31e5d7c [Matei Zaharia] Move existing logic for writing partitioned files into ExternalSorter (cherry picked from commit 6906b69) Signed-off-by: Reynold Xin <[email protected]>

JoshRosen · 2014-08-10T05:16:25Z

core/src/main/scala/org/apache/spark/SparkEnv.scala

+    val shortShuffleMgrNames = Map(
+      "hash" -> "org.apache.spark.shuffle.hash.HashShuffleManager",
+      "sort" -> "org.apache.spark.shuffle.sort.SortShuffleManager")
+    val shuffleMgrName = conf.get("spark.shuffle.manager", "hash")


I ran into a problem using these short names: in ShuffleBlockManager, there's a line that looks at the spark.shuffle.manager property to see whether we're using sort-based shuffle:

// Are we using sort-based shuffle? val sortBasedShuffle = conf.get("spark.shuffle.manager", "") == classOf[SortShuffleManager].getName

This won't work properly if the configuration property is set to one of the short names.

We can't just re-assign the property to the full name because the BlockManager will have already been created by this point and it will have created the ShuffleBlockManager with the wrong property value. Similarly, the ShuffleBlockManager can't access SparkEnv to inspect the actual ShuffleManager because it won't be fully initialized.

I think we should perform all configuration normalization / mutation at a single top-level location and then treat the configuration as immutable from that point forward, since that seems easier to reason about. What do you think about moving the aliasing / normalization to the top of SparkEnv?

I'd rather not change the configuration under the user, that would be confusing if they later print it or look in the web UI. Instead, maybe add a SparkEnv.getShuffleManagerClass(conf: SparkConf) that can return the real class name.

Also I'd be fine initializing the ShuffleBlockManager after the ShuffleManager if that works, and using isInstanceOf. That would be the cleanest.

… no sorting/aggregation and # partitions is small As described in https://issues.apache.org/jira/browse/SPARK-2787, right now sort-based shuffle is more expensive than hash-based for map operations that do no partial aggregation or sorting, such as groupByKey. This is because it has to serialize each data item twice (once when spilling to intermediate files, and then again when merging these files object-by-object). This patch adds a code path to just write separate files directly if the # of output partitions is small, and concatenate them at the end to produce a sorted file. On the unit test side, I added some tests that force or don't force this bypass path to be used, and checked that our tests for other features (e.g. all the operations) cover both cases. Author: Matei Zaharia <[email protected]> Closes apache#1799 from mateiz/SPARK-2787 and squashes the following commits: 88cf26a [Matei Zaharia] Fix rebase 10233af [Matei Zaharia] Review comments 398cb95 [Matei Zaharia] Fix looking up shuffle manager in conf ca3efd9 [Matei Zaharia] Add docs for shuffle manager properties, and allow short names for them d0ae3c5 [Matei Zaharia] Fix some comments 90d084f [Matei Zaharia] Add code path to bypass merge-sort in ExternalSorter, and tests 31e5d7c [Matei Zaharia] Move existing logic for writing partitioned files into ExternalSorter

rxin reviewed Aug 6, 2014
View reviewed changes

mateiz added 7 commits August 6, 2014 19:42

Move existing logic for writing partitioned files into ExternalSorter

31e5d7c

Also renamed ExternalSorter.write(Iterator) to insertAll, to match ExternalAppendOnlyMap

Add code path to bypass merge-sort in ExternalSorter, and tests

90d084f

Fix some comments

d0ae3c5

Add docs for shuffle manager properties, and allow short names for them

ca3efd9

Fix looking up shuffle manager in conf

398cb95

Review comments

10233af

Fix rebase

88cf26a

asfgit closed this in 6906b69 Aug 8, 2014

JoshRosen reviewed Aug 10, 2014
View reviewed changes

JoshRosen mentioned this pull request Aug 16, 2014

[SPARK-2977] Ensure ShuffleManager is created before ShuffleBlockManager #1976

Closed

pwendell mentioned this pull request Apr 27, 2015

SPARK-4550. In sort-based shuffle, store map outputs in serialized form #4450

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARK-2787: Make sort-based shuffle write files directly when there's no sorting/aggregation and # partitions is small #1799

SPARK-2787: Make sort-based shuffle write files directly when there's no sorting/aggregation and # partitions is small #1799

mateiz commented Aug 6, 2014

SparkQA commented Aug 6, 2014

SparkQA commented Aug 6, 2014

SparkQA commented Aug 6, 2014

mateiz commented Aug 6, 2014

SparkQA commented Aug 6, 2014

SparkQA commented Aug 6, 2014

mateiz commented Aug 6, 2014

SparkQA commented Aug 6, 2014

SparkQA commented Aug 6, 2014

mateiz commented Aug 6, 2014

mateiz commented Aug 6, 2014

mateiz commented Aug 6, 2014

SparkQA commented Aug 6, 2014

andrewor14 commented Aug 6, 2014

mateiz commented Aug 6, 2014

SparkQA commented Aug 6, 2014

rxin Aug 6, 2014

mateiz commented Aug 7, 2014

SparkQA commented Aug 7, 2014

SparkQA commented Aug 7, 2014

mateiz commented Aug 8, 2014

rxin commented Aug 8, 2014

rxin commented Aug 8, 2014

mateiz commented Aug 8, 2014

JoshRosen Aug 10, 2014

mateiz Aug 10, 2014

SPARK-2787: Make sort-based shuffle write files directly when there's no sorting/aggregation and # partitions is small #1799

SPARK-2787: Make sort-based shuffle write files directly when there's no sorting/aggregation and # partitions is small #1799

Conversation

mateiz commented Aug 6, 2014

SparkQA commented Aug 6, 2014

SparkQA commented Aug 6, 2014

SparkQA commented Aug 6, 2014

mateiz commented Aug 6, 2014

SparkQA commented Aug 6, 2014

SparkQA commented Aug 6, 2014

mateiz commented Aug 6, 2014

SparkQA commented Aug 6, 2014

SparkQA commented Aug 6, 2014

mateiz commented Aug 6, 2014

mateiz commented Aug 6, 2014

mateiz commented Aug 6, 2014

SparkQA commented Aug 6, 2014

andrewor14 commented Aug 6, 2014

mateiz commented Aug 6, 2014

SparkQA commented Aug 6, 2014

rxin Aug 6, 2014

Choose a reason for hiding this comment

mateiz commented Aug 7, 2014

SparkQA commented Aug 7, 2014

SparkQA commented Aug 7, 2014

mateiz commented Aug 8, 2014

rxin commented Aug 8, 2014

rxin commented Aug 8, 2014

mateiz commented Aug 8, 2014

JoshRosen Aug 10, 2014

Choose a reason for hiding this comment

mateiz Aug 10, 2014

Choose a reason for hiding this comment