-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK-1707. Remove unnecessary 3 second sleep in YarnClusterScheduler #634
Conversation
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
Better would be to make this configurable (and increase it - not remove). |
A little more detail on the JIRA, but, to summarize, the sleep is to wait for executors to connect. What's confusing to me is that Spark standalone doesn't have a similar sleep. @mridulm is the worry that only some executors would register and that tasks would then unnecessarily get scheduled on nodes where the data isn't local? Have you seen this negatively impact performance? If that's the case, I think the best thing would be to wait until some number of executors have registered. |
Ah I see - sorry I didn't read the JIRA. So since we don't know a-priori how many executors to expect, I don't think we can wait on any particular condition. Even in YARN mode, AFAIK the executor number is just a request to the RM and not a guarentee. I'm still not 100% sure why this needs to exist. @mridulm is the main concern here just launching non-local tasks? If so, why not just set In standalone mode this may not really be necessary because the executors typically launch in under 3 seconds which is the default locality wait. |
Keep in mind this sleep is after we've already waited to receive a set of containers from the YARN ResourceManager and sent RPCs to the NodeManagers to start them. So we do know how many executors to expect. |
@sryza Ah I see, in that case, we could wait on the specific condition, but what if e.g. there is a straggler and one container takes a long time to launch (or never launches)? |
Yeah, I think we'd want some (probably configurable) max wait time. |
Seems reasonable - still curious exactly what the case @mridulm is dealing with is. I guess it's just the case where the containers take a long time to launch (once allocated), much longer than the delay scheduling threhsold? |
The scheduling and data locality come later. To give example - suppose we need 200 containers to run a job; as soon as we start, we might get say 2 or 5 nodes allocated : it takes some time for yarn to ramp up to a good number. Now, if we start off the actual job before sufficient containers have been allocated - and we get job failures since a) data cant be cached in memory b) too much load on too low number of containers. Task locality, scheduling comes later - but they are also impacted. |
Btw I dont recall exactly why the actual value of 3 seconds is there; I think earlier spark used to crash in case there are no containers available when job is going to be launched - thankfully that is not the case anymore ! |
Note I had filed SPARK-1453 to make the wait configurable, see the discussions there. I'm fine with removing this once the other one is configurable. I would hesitate to do it before that. I don't know the details of spark.locality.wait to know if it works in all cases. I assume the spark.locality.wait would work fine as long as you have an input from like hadoop or somewhere that tells you the locations, but for RDD's without the locations it wouldn't work to wait? |
I like the idea of making the first wait configurable and taking out the second. For the second, we could possibly still wait for the executors we've launched to register. Regarding spark.locality.wait, yes, it needs to locations, but it doesn't need the preferredLocalityData to be explicitly passed in by the user in the way it's needed for requesting where to put executors. |
@sryza Feel free to take over that jira to make it configurable. |
@sryza can we close this and just do SPARK-1453 and/or SPARK-1946 |
I guess we could keep this around incase we do SPARK-1946 |
@sryza I merged in SPARK-1946, would you want to update this to handle both cluster mode and client mode. I think we can actually also remove the waitForInitialAllocations() call. |
QA tests have started for PR 634. This patch merges cleanly. |
Posted a patch that takes out the sleeps and waitForInitialAllocations |
QA tests have started for PR 634. This patch merges cleanly. |
QA results for PR 634: |
QA results for PR 634: |
@@ -416,19 +407,8 @@ object ApplicationMaster extends Logging { | |||
// This is to ensure that we have reasonable number of containers before we start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove this whole comment block now.
I just thought of something bad here. Now the default behavior is to wait for 0% and we have removed the bit of sleep there so if someone is just using the default they aren't very likely to get many before starting. originally I was thinking keep the default 0 so behavior doesn't change, but personally I would rather see it higher so out of the box behavior is better. thoughts on changing it for yarn? To say 80%? |
That's a good point. Changing it for YARN seems like the right thing, and 80% sounds reasonable to me. Another thing is the wait time. Previously it was 6 seconds, but now spark.scheduler.maxRegisteredExecutorsWaitingTime defaults to 5 times that. 30 seconds seems a little excessive to me in general - at least for jobs without caching, after a couple seconds the wait outweighs scheduling some non-local tasks. What do you think about decreasing this to 6 seconds in general? Or at least for YARN. |
He had increased it to 30 seconds based on some experiments he did. I guess it depends on how well the scheduler recovers from starting with fewer executors. I know they have been doing a bit of work around this area but I'm not sure the real affect. I have definitely seen issues with jobs when they start to soon so I think I would rather leave it at 30 for now unless we have more concrete data saying its not longer an issue. |
Updated patch sets the min registered executors ratio to .8 for YARN |
QA tests have started for PR 634. This patch merges cleanly. |
QA results for PR 634: |
|
||
override def postStartHook() { | ||
|
||
super.postStartHook() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when it is yarn-client mode, super.postStartHook() donot be deleted. because it can waitBackendReady when configured ratio executors have not been registered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YarnClientClusterScheduler extends TaskSchedulerImpl, so it will just fall through to TaskSchedulerImpl.postStartHook() which calls waitBackendReady
Looks good. +1, Thanks @sryza. |
Author: Sandy Ryza <[email protected]> Closes apache#634 from sryza/sandy-spark-1707 and squashes the following commits: 2f6e358 [Sandy Ryza] Default min registered executors ratio to .8 for YARN 354c630 [Sandy Ryza] Remove outdated comments c744ef3 [Sandy Ryza] Take out waitForInitialAllocations 2a4329b [Sandy Ryza] SPARK-1707. Remove unnecessary 3 second sleep in YarnClusterScheduler
As reported in https://spark-project.atlassian.net/browse/SPARK-1055 "The used Spark version in the .../base/Dockerfile is stale on 0.8.1 and should be updated to 0.9.x to match the release." Author: CodingCat <[email protected]> Author: Nan Zhu <[email protected]> Closes apache#634 from CodingCat/SPARK-1055 and squashes the following commits: cb7330e [Nan Zhu] Update Dockerfile adf8259 [CodingCat] fix the SCALA_VERSION and SPARK_VERSION in docker file (cherry picked from commit 1aa4f8a) Signed-off-by: Aaron Davidson <[email protected]>
No description provided.