-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YARN][SPARK-3293]Fix yarn's web show "SUCCEEDED" when the driver throw a exception in yarn-client #3508
Conversation
Can one of the admins verify this patch? |
Please note that YARN-2091 changed the error code: https://issues.apache.org/jira/browse/YARN-2091 |
@oza Thx for your notification, but this PR is just report the AM status to RM and let web show right status. And Implementation of the status report is done by AM nowadays. |
@SaintBacchus I see, I got it. |
* This allows us to catch that and properly set the YARN application status and | ||
* cleanup if needed. | ||
*/ | ||
private def setupSystemSecurityManager(amActor: ActorRef): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The securityManager in the AM was causing a performance impact and we just removed it. I expect the same issue to happen here. #3484
Hey @SaintBacchus as @tgravescs pointed out we can't use Java's security manager here until we understand the exact cause of the performance regression that entails. I think for the 1.2 release we should just have broken exit codes. For this patch it would be good to just fix the exception case if it's possible to do so without Java's security manager. By the way, I wouldn't block the 1.2 release on this, so if you don't have spare cycles then don't worry about rushing this in. |
Hey @andrewor14 , it seems that using security manager may cause a performance impact as https://issues.apache.org/jira/browse/SPARK-4584 said. |
By your suggestion @andrewor14 , do I just fix the two situations: user use sc.stop then report success and when exception occurs report failed? Is it right? |
Yeah the performance regression was difficult to reproduce according to @vanzin. I haven't been able to reproduce it myself but others have. Yes, this patch should only deal with the exception case since we can't safely and efficiently deal with exit codes. |
Just to reinforce, we shouldn't add a security manager without better understanding what impact it has. Feel free to file a bug if you'd like this investigated, but I like the policy of "don't use System.exit()" better. In any case, it seems your patch will fail apps that do a |
Yeah I think what we want from this patch is the following behavior:
I believe this is possible to do without the security manager. In the long run, the ideal behavior we want is:
But there is no easy way to do this without the security manager. |
Ok, I close this PR for so many meanless commits and create a new PR to reslove this prolem. |
spark has five exit situations:
1.system.exit(0) - succeeds
2.clean exit main -succeeds
3.sc.stop - succeeds
4.system.exit(1) - failed
5.exception - failed
But now yarn-client's ExecutorLauncher is not care about this and just report Ok when driver is down.
So I catch this five situations in driver and send the relevant status to ExecutorLauncher to reply the AM status.
In addition, ExecutorLauncher does not need retry if it report the failed status because this driver has been ended before the sencond ExecutorLauncher launched.