-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-7451][YARN] Preemption of executors is counted as failure causing Spark job to fail #5993
Conversation
Can one of the admins verify this patch? |
@@ -373,7 +373,11 @@ private[yarn] class YarnAllocator( | |||
// Hadoop 2.2.X added a ContainerExitStatus we should switch to use | |||
// there are some exit status' we shouldn't necessarily count against us, but for | |||
// now I think its ok as none of the containers are expected to exit | |||
if (completedContainer.getExitStatus == -103) { // vmem limit exceeded | |||
if (completedContainer.getExitStatus == ContainerExitStatus.PREEMPTED) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this constant exist all the way back to Hadoop 2.2? IIRC preemption was added later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the exit status has it back in 2.1.1-beta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vanzin yes its been there since 2.1.1-beta. Here is the JIRA which added that constant: https://issues.apache.org/jira/browse/YARN-1049
Note that the AllocateResponse also contains PreemptionMessage which is list of containers RM is asking for back. AM could potentially use this to decide which containers to kill. That can be future enhancement though. I think this looks good other then @sryza comments. |
Thanks all for reviewing ! |
Yeah I'm sure it would require more plumbing hence a future enhancement if we decide its useful. |
LGTM |
Looks like we never ran tests for this. Jenkins test this please |
Merged build triggered. |
Merged build started. |
Test build #32272 has started for PR 5993 at commit |
LGTM2 |
Test build #32272 has finished for PR 5993 at commit
|
Merged build finished. Test PASSed. |
Test PASSed. |
Committing this |
…sing Spark job to fail Added a check to handle container exit status for the preemption scenario, log an INFO message in such cases and move on. andrewor14 Author: Ashwin Shankar <[email protected]> Closes #5993 from ashwinshankar77/SPARK-7451 and squashes the following commits: 90900cf [Ashwin Shankar] Fix log info message cf8b6cf [Ashwin Shankar] Stop counting preemption of executors as failure (cherry picked from commit b6c797b) Signed-off-by: Sandy Ryza <[email protected]>
Thanks for committing this, Sandy ! |
…sing Spark job to fail Added a check to handle container exit status for the preemption scenario, log an INFO message in such cases and move on. andrewor14 Author: Ashwin Shankar <[email protected]> Closes apache#5993 from ashwinshankar77/SPARK-7451 and squashes the following commits: 90900cf [Ashwin Shankar] Fix log info message cf8b6cf [Ashwin Shankar] Stop counting preemption of executors as failure
…sing Spark job to fail Added a check to handle container exit status for the preemption scenario, log an INFO message in such cases and move on. andrewor14 Author: Ashwin Shankar <[email protected]> Closes apache#5993 from ashwinshankar77/SPARK-7451 and squashes the following commits: 90900cf [Ashwin Shankar] Fix log info message cf8b6cf [Ashwin Shankar] Stop counting preemption of executors as failure
…sing Spark job to fail Added a check to handle container exit status for the preemption scenario, log an INFO message in such cases and move on. andrewor14 Author: Ashwin Shankar <[email protected]> Closes apache#5993 from ashwinshankar77/SPARK-7451 and squashes the following commits: 90900cf [Ashwin Shankar] Fix log info message cf8b6cf [Ashwin Shankar] Stop counting preemption of executors as failure
…sing Spark job to fail Added a check to handle container exit status for the preemption scenario, log an INFO message in such cases and move on. andrewor14 Author: Ashwin Shankar <[email protected]> Closes apache#5993 from ashwinshankar77/SPARK-7451 and squashes the following commits: 90900cf [Ashwin Shankar] Fix log info message cf8b6cf [Ashwin Shankar] Stop counting preemption of executors as failure
Added a check to handle container exit status for the preemption scenario, log an INFO message in such cases and move on.
@andrewor14