-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-2555] Support configuration spark.scheduler.minRegisteredResourcesRatio in Mesos mode. #1462
Conversation
Can one of the admins verify this patch? |
Did you test it on a cluster? I unfortunately don't have access to one and am not an expert on mesos. Is there a race condition between when the scheduler backend increments totalExpectedExecutors and when we actually do the check to see if we have enough? Meaning in this case we increment it as they come in as resource offers, we start the executors, it registers, then we do the check, so its possible once we get 1 in that totalExpectedExecutors =1 * minRegisteredRatio (say 100) == executorActor.size() (1) even though we really expect say 10 to come in? I think the same thing actually applies in standalone mode too but I missed it in previous pr. |
I tested it on a cluster with mesos-0.18.1(fine-grained and coarse-grained), it work well. I think you are right. In fact, user don't have any idea about expected executors in mesos mode (and standalone mode), they only expect CPU cores( |
cc @kayousterhout as I think she is more familiar with standalone mode and scheduler details. |
If we change the name of the config you'll need to upmerge as #634 set some defaults on the yarn side. |
@tgravescs I actually mentioned this race condition in the previous PR: #900 (comment) . In the future we should try to be more careful about merging things that have un-replied to comments (I'm about to send an email to the dev list about this). @li-zhihui if someone points out a problem in a pull request you submit, the expectation is that it will be fixed when you reply to the comment. Can you please submit a new pull request that fixes the race condition with standalone mode, before we proceed with adding this functionality to Mesos mode? |
Sorry @kayousterhout I totally missed it, I didn't read close enough and thought it had been addressed. |
No worries I should have just made a top-level comment -- the code-level On Mon, Jul 21, 2014 at 2:35 PM, Tom Graves [email protected]
|
Sorry @tgravescs @kayousterhout I am not aware of the issue's seriousness at that time. thanks @kayousterhout for your coach. |
Are we trying to figure out the top level issue of the race before we get this in? |
I'd like to see a version of this patch that simply removes support for this in standalone mode - at least so we can do the comparison. @li-zhihui, why can't a user just write this in their own code? It seems really simple... |
@pwendell removing support for this in standalone mode is just keeping totalExpectedExecutors zero. I think it just make user use spark more easily. (And sometimes user isn't aware of the problem unless we show them by docs or conf). Anyway, I think you are the authority on how to make the tradeoff. |
Current code waits until some minimum fraction of expected executors have registered before beginning scheduling. The current code in standalone mode suffers from a race condition (SPARK-2635). This race condition could be fixed, but this functionality is easily achieved by the user (they can use the storage status to determine how many executors are up, as described by @pwendell in apache#1462) so adding the extra complexity to the scheduler code is not worthwile.
Okay let me run it by some more people tomorrow and figure it out. |
Rollback old commits, add a new commit base on latest code. |
Can one of the admins verify this patch? |
This PR has gone stale. Do we want to update it? |
@pwendell Do we need the feature in mesos mode? I am pleasure to update it. |
@@ -62,6 +62,11 @@ private[spark] class MesosSchedulerBackend( | |||
|
|||
var classLoader: ClassLoader = null | |||
|
|||
if (!sc.getConf.getOption("spark.scheduler.minRegisteredResourcesRatio").isEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sc.conf.contains(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks.
@pwendell @kayousterhout what is the verdict of this? Should we just remove the ratio altogether? What about backward compatibility? |
Add some new commits to fix code conflict and some issues. |
…ourcesRatio on docs. The configuration is not supported in mesos mode now. See #1462 Author: Li Zhihui <[email protected]> Closes #4781 from li-zhihui/fixdocconf and squashes the following commits: 63e7a44 [Li Zhihui] Modify default value description for spark.scheduler.minRegisteredResourcesRatio on docs. (cherry picked from commit 10094a5) Signed-off-by: Andrew Or <[email protected]>
…ourcesRatio on docs. The configuration is not supported in mesos mode now. See #1462 Author: Li Zhihui <[email protected]> Closes #4781 from li-zhihui/fixdocconf and squashes the following commits: 63e7a44 [Li Zhihui] Modify default value description for spark.scheduler.minRegisteredResourcesRatio on docs. (cherry picked from commit 10094a5) Signed-off-by: Andrew Or <[email protected]>
…ourcesRatio on docs. The configuration is not supported in mesos mode now. See #1462 Author: Li Zhihui <[email protected]> Closes #4781 from li-zhihui/fixdocconf and squashes the following commits: 63e7a44 [Li Zhihui] Modify default value description for spark.scheduler.minRegisteredResourcesRatio on docs.
@li-zhihui I didn't realize the Given that would you mind closing this? |
In SPARK-1946(PR #900), configuration
spark.scheduler.minRegisteredExecutorsRatio
was introduced(and it was renamedspark.scheduler.minRegisteredResourcesRatio
in PR #1525), but it only support Standalone and Yarn mode.This PR try to introduce the configuration to Mesos mode.
In Mesos coarse-grained mode, the configuration is work well.
In Mesos fine-grained mode, the configuration is ignored because executors are dynamic, and SchedulerBackend will print a warning log if the configuration is set.