-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-10911] Executors should System.exit on clean shutdown. #9946
Conversation
Jenkins, this is ok to test. |
Test build #46627 has finished for PR 9946 at commit
|
See the JIRA. I still don't believe this is safe. At least, causes more problems than it seems to solve. |
@srowen do you know of any actual use cases this will break? We've finished running the user code and are exiting anyway so things should just be shutting down. At this point it shouldn't be committing anything and under normal circumstances if it takes to long its going to get shot anyway. If we know it will affect stuff I'm fine with leaving it out because under normal circumstances the cluster manager handles removing these. But this particular condition happened when there was an issue with the cluster manager as well. Leaving around tasks is bad and anything we can do to protect from users in my opinion is good. |
But isn't the scenario here that the user app isn't done, because it spawned non-daemon threads that are doing something? I agree it's not good practice, but if apps avoided doing this entirely we'd have no problem to begin with. The question is what to do if such a user thread does exist and runs long. Killing it immediately could cause problems, and it's not terribly theoretical: imagine persisting data to disk or writing to a socket and failing to write all the bytes. Not-killing it of course opens the possibility that the thread never stops at all. Which is worse? The long-running user thread is the app's "fault" and is pretty easy to debug by looking at a stack dump. On the other hand killing threads straight away could cause problems for a sort of reasonably behaving app. (Also imagine this non-daemon thread could be in library code.) Maybe some kind of timeout mostly mitigates the issue, but then I think it's just trying to save a fairly clearly misbehaving app from itself at a non-trivial cost. You're always going to have the possibility of a stuck process (deadlock, infinite loop in a driver, etc) and need to be able to kill that if needed as an admin. |
Well the users shouldn't have been using non-daemon threads in the first place, they spawned them while a task was running and then never cleaned them up. We asked them to change that. I can't think of any scenarios where the user should be relying on anything running past when the users task code exits as it would just be a natural race condition. the task finishes and the driver could kill the container, it could finish the app and yarn could kill the container, etc. whatever they are doing may or may not finish anyway. |
@zhuoliu given the discussion in SPARK-10911, could you write a proper PR description and reference HADOOP-12441, which is why this is needed on the YARN side? |
@vanzin So the bug is one thing it can actually happen in other ways too. For instance if someone messes up the node while upgrading the NM during rolling upgrade. Or it could happen if there is a bug in other resource managers (standalone, mesos, etc). I'm fine with updating the description but don't think its limited to HADOOP-12441. It should system exit to make sure users process really exits. |
Sure, it's more for compleness; explaining why you need to explicitly exit, since I'd expect the cluster manager to kill those containers when the app finally finishes, yet there are cases when that doesn't seem to happen. |
I agree with @srowen. Just ending the executor can pose regressions in behavior that are difficult to debug. If the concern is that executor processes are undying, then setting a timeout of some reasonable time before calling |
@vanzin @srowen @andrewor14 , I think your concerns make sense. Also, adding timeout might have another issue that resources cannot be released immediately even if tasks have completed. So for now, we may just close this pull request and the issue, and reopen it if the same problems repeat for other users. |
@zhuoliu OK though you will have to close it |
Thanks, closed. |
Just to point out here I may re-open this. I would still rather fix this, a known bug that I can reproduce then worry about a theoretical it might break something. Unless of course someone can give me a hard use case that this breaks. At the point we call System.exit here all user code is done and we are terminating. If there is something you know of that the user should be allowed to do after this then we should create something like a shutdown hook so the user can properly clean it up. A timeout would probably work here but I'm not fond of it here, it is going to delay it for everyone and you don't give the resources back as quickly. You could also end up in the same situation where if you choose a timeout, say 3 seconds and if this theoretical user code didn't finish within that 3 seconds you would kill it anyway. Actually thinking about this more, maybe this is a perfect time as we can put it in 2.0 so change in behavior is more reasonable. thoughts ? |
Also note System.exit still goes through the normal JVM shutdown hooks, etc. |
I'm not against this change; in fact, this is already the current behavior in healthy YARN installs (since YARN will kill your container). I just wanted the change description to be more descriptive of the underlying reason why the exit is being explicitly added. I also don't buy the existence of any valid user case for a user thread to block container exit, exactly for the reason above. |
@andrewor14 @srowen thoughts or concerns? |
I disagree with: "At the point we call System.exit here all user code is done and we are terminating" It should ideally be so, but, what happens when it isn't? A user shutdown hook could be the thing still executing (I think?) so I don't know if that's a solution. It may be a common sin among fairly righteous apps, as I can imagine all kinds of third party libraries taking some short time to shutdown their pools and flush their whatever to disk. Uncommonly, something pathologically runs forever -- bad app. There's no way to distinguish them. Killing the JVM risks some bad end for the app's cleanup; not killing it risks a stuck JVM. A timeout doesn't solve it; the implicit timeout of 0 here is the most extreme choice in one direction. My gut is that it's better to avoid possibly harming several fairly innocent apps with this behavior change, understanding it means more manual work to chase down and kill the occasional errant bad app. I'm OK being out-voted too, but given the discussion so far that remains my take. |
What happens is that the user code will die anyway because YARN will kill the container. The explicit exit here is to work around a bug in certain versions of YARN (and some broken cluster configurations, e.g., unhealthy NMs). |
Actually, what's the difference between letting I see you're saying it isn't really a behavior change. In the scenario here, the JVM would still be running because some non-daemon thread is alive, so how does YARN decide to forcibly kill it -- it's because the app has already told YARN it's done but hasn't completed? There still seems to be a material difference between always immediately killing threads at the end, and maybe getting killed some short time later, since the former will always occur in normal operation. Could we emulate the delay as well as better-than-nothing? |
You get shutdown hooks when the JVM exits (nor matter how, except
Precisely.
Why? What would that really help with? Any user application relying on "container stays alive after YARN application is deemed finished" is doing something wrong, and cannot assume anything about how long containers will live after the app is finished. I still haven't seen a single, valid use case for not killing these non-daemon threads. |
I'm not suggesting that behavior is correct, but equally you could say an app that never terminates is doing something wrong or a YARN that doesn't stop it is doing something wrong. The lesser-of-two-evils 'use case' is as I say above: some app's cleanup fails not infrequently in an inconsistent state since it was immediately halted and that has its cost as does letting an app get stuck. Simply, which is worse? |
But that's the case. It's a bug in YARN. See https://issues.apache.org/jira/browse/HADOOP-12441.
If the app fails in that way, it is broken. The cleanup should be done before the driver's {{main()}} returns; otherwise, the app has relinquished control back to Spark / the cluster manager, and anything can happen. |
so I haven't seen any use cases this would break. I would argue is they are relying on this behavior its a bug in the user code. They should be using shut down hook or other and that still work with system.exit. If they are doing this now they are getting undeterministic results because YARN and other cluster managers could shoot them at any time. Personally I think inconsistent behavior is worse as it is much harder to debug. |
I don't think this is quite addressing my point, but at best I'm asserting there's a choice between which "bad" behavior you want to deal with, not that either behavior is "OK". Exiting immediately might harm apps that are unintentionally bad (e.g. third party libs flushing); then again tackling stuck apps is protecting the cluster. I'd just reassert the above, but this is a matter of judgment and being outvoted seems OK to me. |
Sorry Sean, I don't see how we're not addressing your comment. Without the change, the behavior you're concerned about already exists, because YARN kills containers when that bug is not present. It's, for all practical effects, exactly the same as calling |
Sure, but it becomes much more likely to bite if you always kill the threads immediately, if they're still running, rather than have them killed a little bit later by YARN. Are you saying YARN immediately kills the container here? then in practice agree. Otherwise this seems like a coherent concern; you may not think it's worth worrying about but it's not hard to grok |
I'm saying that it's the user's fault if his application depends on that non-predictable behavior. We're talking milliseconds here, that might be affected by everything from YARN's internal code to network delays to kernel scheduling. There's absolutely no argument for someone depending on that for the correct behavior of their application. |
@zhuoliu please re-open this. |
Now reopen. |
I'm on board with this if it's true that YARN virtually immediately kills a JVM like this if it's not done by the time the NM thinks it's done. Then indeed regardless of what it does to an app that's tardy in cleaning up, it's not a materially different behavior. If it helps handle another bad app behavior, there's no downside to doing that. |
+1. I'll give this a bit for others to look at and make sure we are done discussing. |
I still would like to see a better commit message (the link to JIRA is implied by the title so is redundant in a commit message). |
Hi @vanzin , do we want to amend the commit message to something like this? |
@zhuoliu that sounds great. Just edit your very first comment (that's the commit message). |
Thanks @vanzin , commit message updated. |
Huh, no. You have to edit the very first comment on this page, not fix the commit message on your github branch. |
Sorry for that. Just updated the first comment and changed the commit message back to original. |
Great, thanks! LGTM. |
Test build #49891 has finished for PR 9946 at commit
|
Can you make a comment to give a summary of the discussion? I am not sure that the commit message has the enough info. |
Sure. "At the point we call System.exit here all user code is done and we are terminating. If there is something you know of that the user should be allowed to do after this then we should create something like a shutdown hook so the user can properly clean it up. "I also don't buy the existence of any valid user case for a user thread to block container exit," "I'm saying that it's the user's fault if his application depends on that non-predictable behavior. We're talking milliseconds here, that might be affected by everything from YARN's internal code to network delays to kernel scheduling. There's absolutely no argument for someone depending on that for the correct behavior of their application."' "I'm on board with this if it's true that YARN virtually immediately kills a JVM like this if it's not done by the time the NM thinks it's done. Then indeed regardless of what it does to an app that's tardy in cleaning up, it's not a materially different behavior. If it helps handle another bad app behavior, there's no downside to doing that." |
Call system.exit explicitly to make sure non-daemon user threads terminate. Without this, user applications might live forever if the cluster manager does not appropriately kill them. E.g., YARN had this bug: HADOOP-12441.