-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-3293] yarn's web show "SUCCEEDED" when the driver throw a exception in yarn-client #2311
Conversation
QA tests have started for PR 2311 at commit
|
QA tests have finished for PR 2311 at commit
|
You will have to rebase your patch with master for tests to pass. |
QA tests have started for PR 2311 at commit
|
QA tests have finished for PR 2311 at commit
|
Jenkins, retest this please. |
QA tests have started for PR 2311 at commit
|
QA tests have finished for PR 2311 at commit
|
System.setSecurityManager(new java.lang.SecurityManager() { | ||
override def checkExit(paramInt: Int) { | ||
if (!stopped) { | ||
throw new NoExitsException(paramInt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of throwing the exception, why not just set exitStatus
directly and avoid having a custom exception? You can also use that as a trigger to call "finish()" if it hasn't been called yet, with a nice error message about the app having called System.exit
.
That avoids messing with the app's exception handling, since very few people realize System.exit
can throw and thus don't really plan for it.
QA tests have started for PR 2311 at commit
|
QA tests have finished for PR 2311 at commit
|
Jenkins, retest this please. |
QA tests have started for PR 2311 at commit
|
QA tests have finished for PR 2311 at commit
|
QA tests have started for PR 2311 at commit
|
QA tests have finished for PR 2311 at commit
|
@@ -119,13 +126,7 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments, | |||
} else { | |||
runExecutorLauncher(securityMgr) | |||
} | |||
|
|||
if (finalStatus != FinalApplicationStatus.UNDEFINED) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're changing this logic, you need to modify runExecutorLauncher
so that it correctly reports an error on failure. Right now it will always exit with 0 and will not report the final status to the RM.
I've been having a multiple different issues with the success/failure reporting (SPARK-3627). Does this handle all of those? |
@tgravescs this change should handle uncaught exceptions and explicit |
@@ -29,6 +29,7 @@ import org.apache.hadoop.yarn.api._ | |||
import org.apache.hadoop.yarn.api.records._ | |||
import org.apache.hadoop.yarn.conf.YarnConfiguration | |||
|
|||
import org.apache.spark.executor.{ExecutorUncaughtExceptionHandler, ExecutorExitCode} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see these used anywhere?
Test FAILed. |
Jenkins, retest this please. |
QA tests have started for PR 2311 at commit
|
QA tests have finished for PR 2311 at commit
|
Test PASSed. |
QA tests have started for PR 2311 at commit
|
QA tests have finished for PR 2311 at commit
|
Test FAILed. |
@witgo There are cases not covered by this pr that I would like to fix up dealing with exit codes and final status. Would you mind if I just pull these changes into mine? I want to try to fix it up across the board. I hope to have something up tomorrow if all the testing goes well. |
Ok, no problem. |
also maybe I'm missing something but I don't see how the changes help in yarn-client mode? The changes you made were on the application master running on a separate host then the driver itself, startUserClass is only used in cluster mode, and when the disconnect happens in client mode we always report success. |
Yes, this does not involve |
No description provided.