[SPARK-2635] Fix race condition at SchedulerBackend.isReady in standalone mode #1525

li-zhihui · 2014-07-22T09:04:58Z

In SPARK-1946(PR #900), configuration spark.scheduler.minRegisteredExecutorsRatio was introduced. However, in standalone mode, there is a race condition where isReady() can return true because totalExpectedExecutors has not been correctly set.

Because expected executors is uncertain in standalone mode, the PR try to use CPU cores(--total-executor-cores) as expected resources to judge whether SchedulerBackend is ready.

AmplabJenkins · 2014-07-22T09:06:56Z

Can one of the admins verify this patch?

li-zhihui · 2014-07-22T09:07:44Z

@kayousterhout @tgravescs

markhamstra · 2014-07-22T16:59:45Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

  if (minRegisteredRatio > 1) minRegisteredRatio = 1
-  // Whatever minRegisteredExecutorsRatio is arrived, submit tasks after the time(milliseconds).
+  // Whatever minRegisteredRatio is arrived, submit tasks after the time(milliseconds).


// Submit tasks time(milliseconds) after minRegisteredRatio is reached

Thanks @markhamstra , but I think the code means that submit tasks time if minRegisteredRatio is not reached.

Ah, I see -- sorry. Looks like this is what we want? // Submit tasks after maxRegisteredWaitingTime milliseconds if minRegisteredRatio has not yet been reached

good, thanks @markhamstra

tgravescs · 2014-07-23T02:13:49Z

can you please also file a jira for this

tgravescs · 2014-07-23T02:14:26Z

Jenkins, test this please

SparkQA · 2014-07-23T02:17:12Z

QA tests have started for PR 1525. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17006/consoleFull

SparkQA · 2014-07-23T03:57:36Z

QA results for PR 1525:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17006/consoleFull

markhamstra · 2014-07-23T04:25:22Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

  // is equal to at least this value, that is double between 0 and 1.
-  var minRegisteredRatio = conf.getDouble("spark.scheduler.minRegisteredExecutorsRatio", 0)
+  var minRegisteredRatio = conf.getDouble("spark.scheduler.minRegisteredResourcesRatio", 0)
  if (minRegisteredRatio > 1) minRegisteredRatio = 1


Looks like minRegisteredRatio is only needed within this class and doesn't need to be a var:

private val minRegisteredRatio = math.min(1, conf.getDouble("spark.scheduler.minRegisteredResourcesRatio", 0))

...actually, it doesn't look like it's used at all anymore except in a log message.

Sorry, was doing something dumb. Leave it a var but clean up the initialization.

Thanks @markhamstra the var member also is used in https://github.com/apache/spark/pull/1525/files#diff-ae6a41a938a767e5bb97b5d738371a5bR34
math.min is a good way.

li-zhihui · 2014-07-23T06:07:46Z

I add a new commit, @tgravescs @markhamstra

markhamstra · 2014-07-23T06:16:07Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

      logInfo("SchedulerBackend is ready for scheduling beginning after waiting " +
-        "maxRegisteredExecutorsWaitingTime: " + maxRegisteredWaitingTime)
+        "maxRegisteredResourcesWaitingTime(ms): " + maxRegisteredWaitingTime)


I'd do these two log messages with string interpolation instead of using +. http://docs.scala-lang.org/overviews/core/string-interpolation.html

@markhamstra thanks

tgravescs · 2014-07-23T13:36:16Z

Jenkins, this is okay to test

tgravescs · 2014-07-23T13:40:56Z

Jenkins, test this please

SparkQA · 2014-07-23T13:43:19Z

QA tests have started for PR 1525. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17033/consoleFull

SparkQA · 2014-07-23T15:26:01Z

QA results for PR 1525:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17033/consoleFull

kayousterhout · 2014-07-23T19:06:04Z

core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala

@@ -36,6 +36,7 @@ private[spark] class SparkDeploySchedulerBackend(
  var shutdownCallback : (SparkDeploySchedulerBackend) => Unit = _

  val maxCores = conf.getOption("spark.cores.max").map(_.toInt)
+  totalExpectedResources.getAndSet(maxCores.getOrElse(0))


Why "getAndSet" here instead of just "set"?

oops, thanks @kayousterhout

kayousterhout · 2014-07-23T19:16:30Z

I find it a bit confusing that "totalRegisteredResources" can refer to cores (in standalone mode) or executors (in Yarn mode). Can we just use different variables in each of the two cases and name them appropriately (so totalRegisteredCores and totalRegisteredExecutors)?

li-zhihui · 2014-07-24T01:02:13Z

@kayousterhout I guess your meaning is "totalExpectedResources" (no variable "totalRegisteredResources").

Now totalRegisteredCores is totalCoreCount, and totalRegisteredExecutors is totalExecutors.

li-zhihui · 2014-07-24T01:19:08Z

@kayousterhout I think using totalExpectedCores and totalExpectedExecutors replacing totalExpectedResources is a good idea, thanks.

li-zhihui · 2014-07-24T02:26:05Z

@tgravescs @kayousterhout I add a new commit.

li-zhihui · 2014-08-01T05:37:06Z

@tgravescs @kayousterhout can you close this PR before code frozen of 1.1 release? Otherwise, it would result in incompatible configuration property name because the PR rename spark.scheduler.maxRegisteredExecutorsWaitingTime to spark.scheduler.maxRegisteredResourcesWaitingTime

kayousterhout · 2014-08-01T07:02:22Z

I will take a look at this tomorrow.

On Thu, Jul 31, 2014 at 10:37 PM, Zhihui Li [email protected]
wrote:

@tgravescs https://github.com/tgravescs @kayousterhout
https://github.com/kayousterhout can you close this PR before code
frozen of 1.1 release? Otherwise, it would result in incompatible
configuration property name because the PR rename
spark.scheduler.maxRegisteredExecutorsWaitingTime to
spark.scheduler.maxRegisteredResourcesWaitingTime

—
Reply to this email directly or view it on GitHub
#1525 (comment).

kayousterhout · 2014-08-02T01:06:10Z

yarn/common/src/main/scala/org/apache/spark/scheduler/cluster/YarnClusterSchedulerBackend.scala

@@ -40,6 +41,10 @@ private[spark] class YarnClusterSchedulerBackend(
    }
    // System property can override environment variable.
    numExecutors = sc.getConf.getInt("spark.executor.instances", numExecutors)
-    totalExpectedExecutors.set(numExecutors)
+    totalExpectedExecutors = numExecutors


nit: make this one line with the one above

@kayousterhout done

kayousterhout · 2014-08-02T01:07:25Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

@@ -47,19 +47,19 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, actorSystem: A
 {
  // Use an atomic variable to track total number of cores in the cluster for simplicity and speed
  var totalCoreCount = new AtomicInteger(0)
-  var totalExpectedExecutors = new AtomicInteger(0)
+  var totalExecutors = new AtomicInteger(0)


one more nit: can we call this totalRegisteredExecutors?

@kayousterhout done

pwendell · 2014-08-02T01:26:09Z

Hey all @kayousterhout asked me to look at this. To me, doesn't make semantic sense to expose spark.scheduler.minRegisteredExecutorsRatio in standalone mode because applications in standalone mode to not request a fixed number of executors a-priori. So my proposal is to remove this feature in standalone mode alltogether. The semantics of spark.cores.max is just the maximum number. In some cases users run jobs with this set well above the number of available cores (because they decided to run on a smaller cluster) and it is fully supported. If we enforce a minimum, it will cause all jobs to hang for those users.

If users in standalone mode want to wait, for now they should add their own code to sleep until the number of desired executors appears. They can do this by calling sc.getExecutorStorageStatus.size() or we can also add an API called sc.getNumExecutors that does this.

Also curious what @mateiz and @aarondav think about this. I haven't been following this patch previously.

pwendell · 2014-08-02T01:27:33Z

I guess this currently is disabled by default, so that's good (it won't break behavior) but I still don't think spark.cores.max is really meant to be used in this way.

kayousterhout · 2014-08-02T05:23:15Z

My preference is also to remove this for standalone mode (as mentioned in the original PR, #900) -- but adding @tgravescs who looked quite a bit at the original PR to see if we're forgetting something important here!

li-zhihui · 2014-08-02T13:37:13Z

Maybe we should think the feature in standalone mode and mesos mode together.
Is it necessary in mesos mode? #1462
@tnachen

tnachen · 2014-08-02T19:52:22Z

If what @pwendell said is true about sparks.core.max then I think probably need to rethink how to approach this regardless what mode we're in. I think I gears towards safety first since this is just an optimization.
I'm still getting up to speed about the scheduling in spark, but I believe the locality calculation is shared in TaskScheduler right?

li-zhihui · 2014-08-04T01:59:39Z

As @pwendell says, the configuration spark.scheduler.minRegisteredExecutorsRatio is disable in standalone mode in default. And in the worst situation, it sleep spark.scheduler.maxRegisteredResourcesWaitingTime (just as currently recommendation). So, my opinion is to keep the configuration in all deploy mode(yarn, standalone, mesos).

tnachen · 2014-08-04T02:16:03Z

I see, the max waiting time does mitigate this. I think minimally probably worth commenting on the configuration to warn the potential hanging and the max timeout will not allow the scheduler to wait forever.

pwendell · 2014-08-07T06:41:53Z

Jenkins, test this please.

SparkQA · 2014-08-07T06:44:36Z

QA tests have started for PR 1525. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18109/consoleFull

SparkQA · 2014-08-07T07:38:11Z

QA results for PR 1525:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18109/consoleFull

tgravescs · 2014-08-07T20:54:46Z

We should figure out before we release 1.1 which we are going to support this on and make sure the config name is good. I'm fine either way as long as it works on the yarn side.

kayousterhout · 2014-08-07T21:03:22Z

I think the verdict here is to leave this feature in, and this patch looks
good to me if it looks good to you, Tom.

On Thu, Aug 7, 2014 at 1:54 PM, Tom Graves [email protected] wrote:

We should figure out before we release 1.1 which we are going to support
this on and make sure the config name is good. I'm fine either way as long
as it works on the yarn side.

—
Reply to this email directly or view it on GitHub
#1525 (comment).

tgravescs · 2014-08-08T18:23:29Z

Changes look good to me.

pwendell · 2014-08-09T05:49:16Z

Okay cool LGTM too... after some thought I'm okay to leave it in. The name right now is a bit awkward, but I don't see a better option at this point. Thanks @kayousterhout and @tgravescs for looking at this.

…lone mode In SPARK-1946(PR #900), configuration <code>spark.scheduler.minRegisteredExecutorsRatio</code> was introduced. However, in standalone mode, there is a race condition where isReady() can return true because totalExpectedExecutors has not been correctly set. Because expected executors is uncertain in standalone mode, the PR try to use CPU cores(<code>--total-executor-cores</code>) as expected resources to judge whether SchedulerBackend is ready. Author: li-zhihui <[email protected]> Author: Li Zhihui <[email protected]> Closes #1525 from li-zhihui/fixre4s and squashes the following commits: e9a630b [Li Zhihui] Rename variable totalExecutors and clean codes abf4860 [Li Zhihui] Push down variable totalExpectedResources to children classes ca54bd9 [li-zhihui] Format log with String interpolation 88c7dc6 [li-zhihui] Few codes and docs refactor 41cf47e [li-zhihui] Fix race condition at SchedulerBackend.isReady in standalone mode (cherry picked from commit 28dbae8) Signed-off-by: Patrick Wendell <[email protected]>

…lone mode In SPARK-1946(PR apache#900), configuration <code>spark.scheduler.minRegisteredExecutorsRatio</code> was introduced. However, in standalone mode, there is a race condition where isReady() can return true because totalExpectedExecutors has not been correctly set. Because expected executors is uncertain in standalone mode, the PR try to use CPU cores(<code>--total-executor-cores</code>) as expected resources to judge whether SchedulerBackend is ready. Author: li-zhihui <[email protected]> Author: Li Zhihui <[email protected]> Closes apache#1525 from li-zhihui/fixre4s and squashes the following commits: e9a630b [Li Zhihui] Rename variable totalExecutors and clean codes abf4860 [Li Zhihui] Push down variable totalExpectedResources to children classes ca54bd9 [li-zhihui] Format log with String interpolation 88c7dc6 [li-zhihui] Few codes and docs refactor 41cf47e [li-zhihui] Fix race condition at SchedulerBackend.isReady in standalone mode

markhamstra reviewed Jul 22, 2014
View reviewed changes

li-zhihui changed the title ~~Fix race condition at SchedulerBackend.isReady in standalone mode~~ [SPARK-2635] Fix race condition at SchedulerBackend.isReady in standalone mode Jul 23, 2014

markhamstra reviewed Jul 23, 2014
View reviewed changes

kayousterhout reviewed Jul 23, 2014
View reviewed changes

li-zhihui mentioned this pull request Jul 29, 2014

[SPARK-2555] Support configuration spark.scheduler.minRegisteredResourcesRatio in Mesos mode. #1462

Closed

li-zhihui added 4 commits August 1, 2014 12:48

Fix race condition at SchedulerBackend.isReady in standalone mode

41cf47e

Few codes and docs refactor

88c7dc6

Format log with String interpolation

ca54bd9

Push down variable totalExpectedResources to children classes

abf4860

kayousterhout reviewed Aug 2, 2014
View reviewed changes

Rename variable totalExecutors and clean codes

e9a630b

asfgit closed this in 28dbae8 Aug 9, 2014

[SPARK-2635] Fix race condition at SchedulerBackend.isReady in standalone mode #1525

[SPARK-2635] Fix race condition at SchedulerBackend.isReady in standalone mode #1525

Conversation

li-zhihui commented Jul 22, 2014

AmplabJenkins commented Jul 22, 2014

li-zhihui commented Jul 22, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgravescs commented Jul 23, 2014

tgravescs commented Jul 23, 2014

SparkQA commented Jul 23, 2014

SparkQA commented Jul 23, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

li-zhihui commented Jul 23, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgravescs commented Jul 23, 2014

tgravescs commented Jul 23, 2014

SparkQA commented Jul 23, 2014

SparkQA commented Jul 23, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kayousterhout commented Jul 23, 2014

li-zhihui commented Jul 24, 2014

li-zhihui commented Jul 24, 2014

li-zhihui commented Jul 24, 2014

li-zhihui commented Aug 1, 2014

kayousterhout commented Aug 1, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pwendell commented Aug 2, 2014

pwendell commented Aug 2, 2014

kayousterhout commented Aug 2, 2014

li-zhihui commented Aug 2, 2014

tnachen commented Aug 2, 2014

li-zhihui commented Aug 4, 2014

tnachen commented Aug 4, 2014

pwendell commented Aug 7, 2014

SparkQA commented Aug 7, 2014

SparkQA commented Aug 7, 2014

tgravescs commented Aug 7, 2014

kayousterhout commented Aug 7, 2014

tgravescs commented Aug 8, 2014

pwendell commented Aug 9, 2014