-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait for daemon to fully terminate in "stop" #276
Conversation
os.kill(pid, SIGTERM) | ||
process.send_signal(SIGTERM) | ||
while process.is_running(): | ||
time.sleep(0.25) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using os.kill(pid, 0)
to check if the PID still exists instead of adding the psutil
dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's another problem with this approach. I just realized that when MO shuts down a cluster it also removes the server's log file and dbpath. If we merge this change then I would expect there to be no log file to upload in patch builds.
What if you copy the log files you're interested in before shutting down the cluster?
Are you sure that's the case? The log and db files seem to stick around for me after I shut the cluster down. When I start up a new cluster the old ones are deleted, however. |
Yes the order is:
When I run |
Hm, something must be messed up on my end. Well, given that, I don't think we can merge this patch, since it would seemingly break everyone's log storage. I think as you suggest we'll have to take an approach outside of mongo-orchestration. I originally planned on doing this by adding a separate script to drivers-evergreen-tools that issues a `curl DELETE; doing that wouldn't cause mo to delete the log/data files too, right? |
Also, what do you think about like a |
DELETE would also remove the log files. MO deletes the files anytime it shuts down a cluster:
|
I think a simpler solution might be to copy the files you're interested in before uploading them. Would that avoid the tar failures you're seeing?
It wouldn't break Python's log upload task because we upload the logs before shutting down mongo-orchestration:
|
@ShaneHarvey I think the whole problem is that those files might still be written to, and |
Here's a patch confirming what @mbroadst said. I think most drivers run "upload-mo-artifacts" in |
Yeah I think it's fine to not cleanup the files after shutting down the clusters in the cleanup_storage signal handler. |
So I discovered why the logs were apparently not being deleted on my machine: In the code snippet you pasted above ( How about we make this behavior official and remove the attempted log path deletion from cleanup? The nice thing is since this won't have any behavioral changes (this bug has been in there for 7+ years), we know we won't break any evergreen configs. Then I won't need to modify anything else and can just use the blocking stop changes already made thus far in combination with a custom log path. |
SGTM. Can you open a follow-up issue to not delete the default logpath ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
One note. I've noticed that shutting down clusters can take a long time, especially with newer server versions. If that proves to be the case in Evergreen, we may need to optimize |
I wasn't able to run the entire test suite locally due to similar issues found in #273, but I did run drivers-evergreen-tools and the swift driver's evergreen's with this patch and they passed fine. As you pointed out, the tasks that use latest sharded clusters can take up to 30s longer now that it waits for shutdown, but in my experience that is usually only a small percentage of the total time a task takes to run. Do you think that's sufficient testing or should I try some more drivers out? |
Yes I think that's fine. I opened #278 to make |
It looks like I don't have write access to this repository, so could you squash/merge it for me? |
Right now,
mongo-orchestration stop
on non-Windows platforms simply sends a signal to the mongo-orchestration daemon and exits immediately. In the background, the mongo child processes and the daemon itself will shut down after a brief period.This PR updates the "stop" command to wait until the daemon has fully shut down before returning, effectively causing
mongo-orchestration stop
to block until both the daemon and the cluster are no longer running.As part of this change, I added a dependency on
psutil
to facilitate polling whether the process is still alive or not. If adding a dependency is considered costly, that functionality could be reimplemented in mongo-orchestration. The library does seem like it could be useful in other areas of the codebase, however.I was having trouble running the full test suite (with or without my changes) due to
test_replica_sets.ReplicaSetAuthTestCase
hanging, so unfortunately I do not know if this patch passes completely. I did test it manually and it seems to work, though.