[SPARK-22465][FOLLOWUP] Update the number of partitions of default partitioner when defaultParallelism is set #20091

jiangxb1987 · 2017-12-27T14:24:07Z

What changes were proposed in this pull request?

#20002 purposed a way to safe check the default partitioner, however, if spark.default.parallelism is set, the defaultParallelism still could be smaller than the proper number of partitions for upstreams RDDs. This PR tries to extend the approach to address the condition when spark.default.parallelism is set.

The requirements where the PR helps with are :

Max partitioner is not eligible since it is atleast an order smaller, and
User has explicitly set 'spark.default.parallelism', and
Value of 'spark.default.parallelism' is lower than max partitioner
Since max partitioner was discarded due to being at least an order smaller, default parallelism is worse - even though user specified.

Under the rest cases, the changes should be no-op.

How was this patch tested?

Add corresponding test cases in PairRDDFunctionsSuite and PartitioningSuite.

jiangxb1987 · 2017-12-27T14:24:51Z

cc @sujithjay @mridulm @cloud-fan PTAL

SparkQA · 2017-12-27T17:56:45Z

Test build #85436 has finished for PR 20091 at commit 4751463.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mridulm · 2017-12-27T19:45:49Z

This changes the existing behavior of spark - expectation is for RDD's without partitioner to use spark.default.parallelism for shuffle; when one or more RDD's do have partitioner it was more an implementation detail of which gets picked (in some cases 'first', in others 'biggest', etc).

Consider case of cogroup of two filtered rdd's - users set parallelism (either explicitly or implicitly in case of yarn) to handle these cases.

jiangxb1987 · 2017-12-28T04:25:46Z

The major concern is that spark.default.parallelism usually is set a relatively small value, so in case the safety-check failed, the value of defaultParallelism can even be smaller than the number of partitions of the existing partitioner, this is the regression case I want to fix in this PR.

A further more issue is that, we should rethink whether we should rely on defaultParallelism to determine the numPartitions of the default partitioner, or the number of partitions should be determined completely dynamistic by the upstream RDDs? The current same-as-defaultParallelism way is really prone to cause OOM during shuffle stage.

mridulm · 2017-12-28T05:12:05Z

@jiangxb1987 I am not disagreeing with your hypothesis that default parallelism might not be optimal in all cases within an application (example - when different RDD's in application have widely varying cardinalities).

Since spark.default.parallelism is an exposed interface, which applications depend on, changing the semantics here will be a regression in terms of functionality and will be breaking an exposed contract in spark (#20002 explicitly applied to a documented case where default does not apply).

This is why we have the option of explicitly overriding number of partitions when default does not work well.

jiangxb1987 · 2017-12-28T14:19:57Z

@mridulm Actually you have a good point on that we should be extremely careful in making change s related to an exposed interface, now I'll narrow down the scope of this PR to:

If the safety check fails and spark.default.parallelism is set, and the value of defaultParallelism is smaller than the number of existing partitioner, then we should still use the existing partitioner.

Does this make more sense?

mridulm · 2017-12-29T10:54:45Z

@jiangxb1987 I am not sure I followed that completely.
defaultParallelism will always be present - either via explicit use specification of spark.default.parallelism in spark conf by user or implicitly obtained based on the scheduler backend (both of which are detailed in spark.default.parallelism contract).

jiangxb1987 · 2017-12-29T13:27:46Z

As now used in the function defaultPartitioner(), defaultParallelism only take effect when spark.default.parallelism is explicitly set. Previously before #20002 , if there is any existing partitioner in upstream RDDs, we won't create a new partitioner using defaultParallelism. But after that change, we may create a new partition whose number of partitions is defaultParallelism when the safety-check failed and spark.default.parallelism is explicitly set, but defaultParallelism can be smaller than the numPartitions of the existing partitioner, so the new partitioner should still fail the safety-check. I'm proposing in the regression case described above, we should still use the existing partitioner, instead of create a new partitioner which have less number of partitions. IMO this also follows the documented spark.default.parallelism contract.

mridulm · 2017-12-29T21:12:53Z

Can you code this up ? I am really not able to parse that block of text :-)

jiangxb1987 · 2017-12-30T13:53:17Z

Sure, will do after I arrived SF next Monday. Thanks! :)

SparkQA · 2018-01-17T01:32:03Z

Test build #86202 has finished for PR 20091 at commit 62088ca.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jiangxb1987 · 2018-01-19T01:53:33Z

@sujithjay @mridulm @cloud-fan @gatorsmile PTAL

cloud-fan · 2018-01-19T03:34:22Z

core/src/main/scala/org/apache/spark/Partitioner.scala

+    // If the existing max partitioner is an eligible one, or its partitions number is larger
+    // than the default number of partitions, use the existing partitioner.
+    if (hasMaxPartitioner.nonEmpty && (isEligiblePartitioner(hasMaxPartitioner.get, rdds) ||
+        defaultNumPartitions < hasMaxPartitioner.get.getNumPartitions)) {


This is the core change. I think it makes sense as it fixes a regression in #20002

If the partitioner is not eligible, but its numPartition is larger the the default one, we should still pick this partitioner instead of creating a new one.

There are multiple cases here.

a) spark.default.parallelism is not set by user.
For this case, PR is a noop

b) maxPartitions is atleast an order higher than max partitioner

b.1) If spark.default.parallelism is not set, the PR is a noop.

b.2) spark.default.parallelism is explicitly set by user.

This is a change in behavior which has been introduced - rely on user specified value instead of trying to infer it when inferred value is off by atleast an order.

If users were setting suboptimal values for "spark.default.parallelism" - then there will be a change in behavior - though I would argue this is the expected behavior given documentation of 'spark.default.parallelism'

It depends on how you define "default". In this case, if we can benefit from reusing an existing partitioner, we should pick that partitioner. If we want to respect spark.default.parallelism strictly, we should not reuse partitioner at all.

For this particular case, picking the existing partitioner is obviously a better choice and it was the behavior before #20002 , so I'm +1 on this change.

[Edited, hopefully, for clarity]

It depends on how you define "default".

I dont see an ambiguity here - am I missing something ?
To rephrase my point - this proposed PR has an impact only if user has explicitly set 'spark.default.parallelism' - else it is a noop.
If this is not the case (other than desired behavior of SPARK-22465), I might be missing something; do let me know !

What is the concern here ? Users have set incorrect values for spark.default.parallelism ?

If we want to respect spark.default.parallelism strictly, we should not reuse partitioner at all.

I agree with you - we should not have - except that ship has sailed long long time back - since atleast 0.5 this has been the behavior in spark - I dont have context before that.
Historically, default parallelism was added later - using "largest partitioner if set or largest partition size when no partitioner is set" was the behavior. When default parallelism was introduced, probably (I guess) for backward compatible, the behavior was continued.

#20002 surgically fixed only the case when inferred partition size was off by atleast an order.
When it is off by an order - dont rely on largest partitioner, it is not useful due to OOM's.
In this case, if user has explicitly specified spark.default.parallelism, rely on user provided value - else preserve existing behavior of picking largest partition.

I think we all agree that reusing partitioner is an existing behavior and we should not stick to spark.default.parallelism here.

#20002 is good as it fixes a bad case where reusing partitioner slows down the query. And this PR surgically fixed one regression introduced by #20002 that, even if the existing partitioner is not eligible(has very little partitions), it's still better than fallback to default parallelism.

mridulm · 2018-01-19T07:58:17Z

Thanks for coding it up @jiangxb1987 !

So if I understand it correctly, the requirements where the PR helps with are :

Max partitioner is not eligible since it is atleast an order smaller, and
User has explicitly set 'spark.default.parallelism', and
Value of 'spark.default.parallelism' is lower than max partitioner
- Since max partitioner was discarded due to being atleast an order smaller, default parallelism is worse - even though user specified.

Does it impact any other usecase or flow ? I want to make sure I am not missing anything.
If strictly this, then I agree that the PR makes sense. It is a fairly suboptimal situation which we are hopefully not worsening - even if we are ignoring user specified value (by relying on pre-existing behavior exposed by spark :-) )

jiangxb1987 · 2018-01-19T18:38:28Z

@mridulm Great write up! Yeah it's exactly that you described, and I've copied them to the PR description.

mridulm · 2018-01-19T20:59:03Z

@jiangxb1987 Thanks for clarifying, looks good to me - I will merge it later today evening (assuming someone else does not before :) )

jiangxb1987 · 2018-01-19T21:04:34Z

@mridulm Thank you!

mridulm · 2018-01-20T09:05:19Z

core/src/main/scala/org/apache/spark/Partitioner.scala

+   * If any of the RDDs already has a partitioner, and the partitioner is an eligible one (with a
+   * partitions number that is not less than the max number of upstream partitions by an order of
+   * magnitude), or the number of partitions is larger than the default one, we'll choose the
+   * exsiting partitioner.


We should rephrase this for clarity.
How about
"When available, we choose the partitioner from rdds with maximum number of partitions. If this partitioner is eligible (number of partitions within an order of maximum number of partitions in rdds), or has partition number higher than default partitions number - we use this partitioner"

mridulm · 2018-01-20T09:08:37Z

core/src/test/scala/org/apache/spark/PartitioningSuite.scala

+        .partitionBy(new HashPartitioner(100))
+      val rdd4 = sc.parallelize(Array((1, 2), (2, 3), (2, 4), (3, 4)))
+        .partitionBy(new HashPartitioner(9))
+      val rdd5 = sc.parallelize((1 to 10).map(x => (x, x)), 11)


Can we add a case where partitioner is not used and default (from spark.default.parallelism) gets used ?
For example, something like the following pseudo

val rdd6 = sc.parallelize(Array((1, 2), (2, 3), (2, 4), (3, 4))).partitionBy(new HashPartitioner(3)) ... Partitioner.defaultPartitioner(rdd1, rdd6).numPartitions == sc.conf.get("spark.default.parallelism").toInt

mridulm · 2018-01-20T09:14:16Z

core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala

+    }
+  }
+
+  test("cogroup between multiple RDD when defaultParallelism is set with huge number of " +


nit: "set; with huge number of partitions in upstream RDDs"

mridulm · 2018-01-22T21:48:45Z

Thanks @jiangxb1987 for the great work !
I will merge this in when the build successfully completes.

SparkQA · 2018-01-23T01:24:56Z

Test build #86496 has finished for PR 20091 at commit ccdc0e6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…rtitioner when defaultParallelism is set ## What changes were proposed in this pull request? #20002 purposed a way to safe check the default partitioner, however, if `spark.default.parallelism` is set, the defaultParallelism still could be smaller than the proper number of partitions for upstreams RDDs. This PR tries to extend the approach to address the condition when `spark.default.parallelism` is set. The requirements where the PR helps with are : - Max partitioner is not eligible since it is atleast an order smaller, and - User has explicitly set 'spark.default.parallelism', and - Value of 'spark.default.parallelism' is lower than max partitioner - Since max partitioner was discarded due to being at least an order smaller, default parallelism is worse - even though user specified. Under the rest cases, the changes should be no-op. ## How was this patch tested? Add corresponding test cases in `PairRDDFunctionsSuite` and `PartitioningSuite`. Author: Xingbo Jiang <[email protected]> Closes #20091 from jiangxb1987/partitioner. (cherry picked from commit 96cb60b) Signed-off-by: Mridul Muralidharan <[email protected]>

…mPartitions is equal to maxPartitioner.numPartitions ## What changes were proposed in this pull request? Followup of #20091. We could also use existing partitioner when defaultNumPartitions is equal to the maxPartitioner's numPartitions. ## How was this patch tested? Existed. Closes #23581 from Ngone51/dev-use-existing-partitioner-when-defaultNumPartitions-equalTo-MaxPartitioner#-numPartitions. Authored-by: Ngone51 <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

…mPartitions is equal to maxPartitioner.numPartitions ## What changes were proposed in this pull request? Followup of apache#20091. We could also use existing partitioner when defaultNumPartitions is equal to the maxPartitioner's numPartitions. ## How was this patch tested? Existed. Closes apache#23581 from Ngone51/dev-use-existing-partitioner-when-defaultNumPartitions-equalTo-MaxPartitioner#-numPartitions. Authored-by: Ngone51 <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

jiangxb1987 changed the title ~~[SPARK-22465] Update the number of partitions of default partitioner when defaultParallelism is set~~ [SPARK-22465][FOLLOWUP] Update the number of partitions of default partitioner when defaultParallelism is set Dec 27, 2017

deal with partitioner when defaultParallelism is explict set.

62088ca

jiangxb1987 force-pushed the partitioner branch from 4751463 to 62088ca Compare January 16, 2018 21:33

cloud-fan reviewed Jan 19, 2018

View reviewed changes

mridulm reviewed Jan 20, 2018

View reviewed changes

improve comments and tests.

ccdc0e6

asfgit closed this in 96cb60b Jan 23, 2018

Ngone51 mentioned this pull request Jan 18, 2019

[SPARK-22465][CORE][FOLLOWUP]Use existing partitioner when defaultNumPartitions is equal to maxPartitioner.numPartitions #23581

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-22465][FOLLOWUP] Update the number of partitions of default partitioner when defaultParallelism is set #20091

[SPARK-22465][FOLLOWUP] Update the number of partitions of default partitioner when defaultParallelism is set #20091

jiangxb1987 commented Dec 27, 2017 •

edited

Loading

jiangxb1987 commented Dec 27, 2017

SparkQA commented Dec 27, 2017

mridulm commented Dec 27, 2017

jiangxb1987 commented Dec 28, 2017

mridulm commented Dec 28, 2017 •

edited

Loading

jiangxb1987 commented Dec 28, 2017 •

edited

Loading

mridulm commented Dec 29, 2017

jiangxb1987 commented Dec 29, 2017 •

edited

Loading

mridulm commented Dec 29, 2017

jiangxb1987 commented Dec 30, 2017

SparkQA commented Jan 17, 2018

jiangxb1987 commented Jan 19, 2018

cloud-fan Jan 19, 2018

mridulm Jan 19, 2018

cloud-fan Jan 19, 2018

mridulm Jan 19, 2018 •

edited

Loading

cloud-fan Jan 19, 2018

mridulm commented Jan 19, 2018 •

edited

Loading

jiangxb1987 commented Jan 19, 2018

mridulm commented Jan 19, 2018

jiangxb1987 commented Jan 19, 2018

mridulm Jan 20, 2018

mridulm Jan 20, 2018

mridulm Jan 20, 2018

mridulm commented Jan 22, 2018

SparkQA commented Jan 23, 2018

[SPARK-22465][FOLLOWUP] Update the number of partitions of default partitioner when defaultParallelism is set #20091

[SPARK-22465][FOLLOWUP] Update the number of partitions of default partitioner when defaultParallelism is set #20091

Conversation

jiangxb1987 commented Dec 27, 2017 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

jiangxb1987 commented Dec 27, 2017

SparkQA commented Dec 27, 2017

mridulm commented Dec 27, 2017

jiangxb1987 commented Dec 28, 2017

mridulm commented Dec 28, 2017 • edited Loading

jiangxb1987 commented Dec 28, 2017 • edited Loading

mridulm commented Dec 29, 2017

jiangxb1987 commented Dec 29, 2017 • edited Loading

mridulm commented Dec 29, 2017

jiangxb1987 commented Dec 30, 2017

SparkQA commented Jan 17, 2018

jiangxb1987 commented Jan 19, 2018

cloud-fan Jan 19, 2018

Choose a reason for hiding this comment

mridulm Jan 19, 2018

Choose a reason for hiding this comment

cloud-fan Jan 19, 2018

Choose a reason for hiding this comment

mridulm Jan 19, 2018 • edited Loading

Choose a reason for hiding this comment

cloud-fan Jan 19, 2018

Choose a reason for hiding this comment

mridulm commented Jan 19, 2018 • edited Loading

jiangxb1987 commented Jan 19, 2018

mridulm commented Jan 19, 2018

jiangxb1987 commented Jan 19, 2018

mridulm Jan 20, 2018

Choose a reason for hiding this comment

mridulm Jan 20, 2018

Choose a reason for hiding this comment

mridulm Jan 20, 2018

Choose a reason for hiding this comment

mridulm commented Jan 22, 2018

SparkQA commented Jan 23, 2018

jiangxb1987 commented Dec 27, 2017 •

edited

Loading

mridulm commented Dec 28, 2017 •

edited

Loading

jiangxb1987 commented Dec 28, 2017 •

edited

Loading

jiangxb1987 commented Dec 29, 2017 •

edited

Loading

mridulm Jan 19, 2018 •

edited

Loading

mridulm commented Jan 19, 2018 •

edited

Loading