Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-2555] Support configuration spark.scheduler.minRegisteredResourcesRatio in Mesos mode. #1462

Closed
wants to merge 3 commits into from

Conversation

li-zhihui
Copy link
Contributor

In SPARK-1946(PR #900), configuration spark.scheduler.minRegisteredExecutorsRatio was introduced(and it was renamed spark.scheduler.minRegisteredResourcesRatio in PR #1525), but it only support Standalone and Yarn mode.
This PR try to introduce the configuration to Mesos mode.

In Mesos coarse-grained mode, the configuration is work well.
In Mesos fine-grained mode, the configuration is ignored because executors are dynamic, and SchedulerBackend will print a warning log if the configuration is set.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@li-zhihui
Copy link
Contributor Author

@tgravescs

@tgravescs
Copy link
Contributor

Did you test it on a cluster? I unfortunately don't have access to one and am not an expert on mesos.

Is there a race condition between when the scheduler backend increments totalExpectedExecutors and when we actually do the check to see if we have enough? Meaning in this case we increment it as they come in as resource offers, we start the executors, it registers, then we do the check, so its possible once we get 1 in that totalExpectedExecutors =1 * minRegisteredRatio (say 100) == executorActor.size() (1) even though we really expect say 10 to come in?

I think the same thing actually applies in standalone mode too but I missed it in previous pr.

@li-zhihui
Copy link
Contributor Author

@tgravescs

I tested it on a cluster with mesos-0.18.1(fine-grained and coarse-grained), it work well.

I think you are right. In fact, user don't have any idea about expected executors in mesos mode (and standalone mode), they only expect CPU cores(spark.cores.max). So we need check total registered executors' cores and spark.cores.max to judge whether SchedulerBackend is ready, and modify spark.scheduler.minRegisteredExecutorsRatio to spark.scheduler.minRegisteredResourcesRatio.
How do you think about it?

@tgravescs
Copy link
Contributor

cc @kayousterhout as I think she is more familiar with standalone mode and scheduler details.

@tgravescs
Copy link
Contributor

If we change the name of the config you'll need to upmerge as #634 set some defaults on the yarn side.

@kayousterhout
Copy link
Contributor

@tgravescs I actually mentioned this race condition in the previous PR: #900 (comment) . In the future we should try to be more careful about merging things that have un-replied to comments (I'm about to send an email to the dev list about this).

@li-zhihui if someone points out a problem in a pull request you submit, the expectation is that it will be fixed when you reply to the comment. Can you please submit a new pull request that fixes the race condition with standalone mode, before we proceed with adding this functionality to Mesos mode?

@tgravescs
Copy link
Contributor

Sorry @kayousterhout I totally missed it, I didn't read close enough and thought it had been addressed.

@kayousterhout
Copy link
Contributor

No worries I should have just made a top-level comment -- the code-level
comments are easy to miss once they get compressed because the code is out
of date.

On Mon, Jul 21, 2014 at 2:35 PM, Tom Graves [email protected]
wrote:

Sorry @kayousterhout https://github.com/kayousterhout I totally missed
it, I didn't read close enough and thought it had been addressed.


Reply to this email directly or view it on GitHub
#1462 (comment).

@li-zhihui
Copy link
Contributor Author

Sorry @tgravescs @kayousterhout I am not aware of the issue's seriousness at that time. thanks @kayousterhout for your coach.

@tnachen
Copy link
Contributor

tnachen commented Jul 28, 2014

Are we trying to figure out the top level issue of the race before we get this in?

@li-zhihui
Copy link
Contributor Author

@tnachen I add a new PR to try to fix the issue, #1525

@pwendell
Copy link
Contributor

pwendell commented Aug 4, 2014

I'd like to see a version of this patch that simply removes support for this in standalone mode - at least so we can do the comparison. @li-zhihui, why can't a user just write this in their own code? It seems really simple...

@li-zhihui
Copy link
Contributor Author

@pwendell removing support for this in standalone mode is just keeping totalExpectedExecutors zero.
li-zhihui@fa5af15

I think it just make user use spark more easily. (And sometimes user isn't aware of the problem unless we show them by docs or conf). Anyway, I think you are the authority on how to make the tradeoff.

kayousterhout added a commit to kayousterhout/spark-1 that referenced this pull request Aug 4, 2014
Current code waits until some minimum fraction of expected executors
have registered before beginning scheduling.  The current code in
standalone mode suffers from a race condition (SPARK-2635). This
race condition could be fixed, but this functionality is easily
achieved by the user (they can use the storage status to determine
how many executors are up, as described by @pwendell in apache#1462)
so adding the extra complexity to the scheduler code is not worthwile.
@kayousterhout
Copy link
Contributor

@pwendell I created #1762 for your judgment of what the right thing to do here is!

@pwendell
Copy link
Contributor

pwendell commented Aug 4, 2014

Okay let me run it by some more people tomorrow and figure it out.

@li-zhihui li-zhihui changed the title [SPARK-2555] Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mes... [SPARK-2555] Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode. Aug 25, 2014
@li-zhihui
Copy link
Contributor Author

Rollback old commits, add a new commit base on latest code.

@pwendell @tgravescs @kayousterhout @tnachen

@li-zhihui li-zhihui changed the title [SPARK-2555] Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode. [SPARK-2555] Support configuration spark.scheduler.minRegisteredResourcesRatio in Mesos mode. Aug 25, 2014
@SparkQA
Copy link

SparkQA commented Sep 5, 2014

Can one of the admins verify this patch?

@nchammas
Copy link
Contributor

This PR has gone stale. Do we want to update it?

@li-zhihui
Copy link
Contributor Author

@pwendell Do we need the feature in mesos mode? I am pleasure to update it.

@@ -62,6 +62,11 @@ private[spark] class MesosSchedulerBackend(

var classLoader: ClassLoader = null

if (!sc.getConf.getOption("spark.scheduler.minRegisteredResourcesRatio").isEmpty) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sc.conf.contains(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks.

@andrewor14
Copy link
Contributor

@pwendell @kayousterhout what is the verdict of this? Should we just remove the ratio altogether? What about backward compatibility?

@li-zhihui
Copy link
Contributor Author

Add some new commits to fix code conflict and some issues.

asfgit pushed a commit that referenced this pull request Feb 26, 2015
…ourcesRatio on docs.

The configuration is not supported in mesos mode now.
See #1462

Author: Li Zhihui <[email protected]>

Closes #4781 from li-zhihui/fixdocconf and squashes the following commits:

63e7a44 [Li Zhihui] Modify default value description for spark.scheduler.minRegisteredResourcesRatio on docs.

(cherry picked from commit 10094a5)
Signed-off-by: Andrew Or <[email protected]>
asfgit pushed a commit that referenced this pull request Feb 26, 2015
…ourcesRatio on docs.

The configuration is not supported in mesos mode now.
See #1462

Author: Li Zhihui <[email protected]>

Closes #4781 from li-zhihui/fixdocconf and squashes the following commits:

63e7a44 [Li Zhihui] Modify default value description for spark.scheduler.minRegisteredResourcesRatio on docs.

(cherry picked from commit 10094a5)
Signed-off-by: Andrew Or <[email protected]>
asfgit pushed a commit that referenced this pull request Feb 26, 2015
…ourcesRatio on docs.

The configuration is not supported in mesos mode now.
See #1462

Author: Li Zhihui <[email protected]>

Closes #4781 from li-zhihui/fixdocconf and squashes the following commits:

63e7a44 [Li Zhihui] Modify default value description for spark.scheduler.minRegisteredResourcesRatio on docs.
@andrewor14
Copy link
Contributor

@li-zhihui I didn't realize the spark.scheduler.minRegisteredResourcesRatio defaulted to 0 in standalone mode. Given that I think it's safe to remove support for it, since it will only affect people who explicitly set this. For 1.4 we can deprecate it in standalone mode and for 1.5+ we can remove it.

Given that would you mind closing this?

@li-zhihui li-zhihui closed this Feb 27, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants