Wait for daemon to fully terminate in "stop" #276

patrickfreed · 2020-01-21T20:51:23Z

Right now, mongo-orchestration stop on non-Windows platforms simply sends a signal to the mongo-orchestration daemon and exits immediately. In the background, the mongo child processes and the daemon itself will shut down after a brief period.

This PR updates the "stop" command to wait until the daemon has fully shut down before returning, effectively causing mongo-orchestration stop to block until both the daemon and the cluster are no longer running.

As part of this change, I added a dependency on psutil to facilitate polling whether the process is still alive or not. If adding a dependency is considered costly, that functionality could be reimplemented in mongo-orchestration. The library does seem like it could be useful in other areas of the codebase, however.

I was having trouble running the full test suite (with or without my changes) due to test_replica_sets.ReplicaSetAuthTestCase hanging, so unfortunately I do not know if this patch passes completely. I did test it manually and it seems to work, though.

ShaneHarvey · 2020-01-21T20:57:52Z

mongo_orchestration/daemon.py

-                os.kill(pid, SIGTERM)
+                process.send_signal(SIGTERM)
+                while process.is_running():
+                    time.sleep(0.25)


What about using os.kill(pid, 0) to check if the PID still exists instead of adding the psutil dependency?

ShaneHarvey

There's another problem with this approach. I just realized that when MO shuts down a cluster it also removes the server's log file and dbpath. If we merge this change then I would expect there to be no log file to upload in patch builds.

What if you copy the log files you're interested in before shutting down the cluster?

patrickfreed · 2020-01-21T21:36:41Z

Are you sure that's the case? The log and db files seem to stick around for me after I shut the cluster down. When I start up a new cluster the old ones are deleted, however.

ShaneHarvey · 2020-01-21T21:48:12Z

Yes the order is:

SIGTERM
cleanup_storage
Servers().cleanup()
process.cleanup_mprocess()
remove dbpath, logPath:https://github.com/10gen/mongo-orchestration/blob/8159c946e5aa882de9b69b20e909dfd89fab5706/mongo_orchestration/process.py#L274-L282

When I run mongo-orchestration stop, the log files are gone after the daemon exits.

patrickfreed · 2020-01-21T22:10:37Z

Hm, something must be messed up on my end. Well, given that, I don't think we can merge this patch, since it would seemingly break everyone's log storage. I think as you suggest we'll have to take an approach outside of mongo-orchestration.

I originally planned on doing this by adding a separate script to drivers-evergreen-tools that issues a `curl DELETE; doing that wouldn't cause mo to delete the log/data files too, right?

patrickfreed · 2020-01-21T22:36:43Z

Also, what do you think about like a --preserveMongoData option that would toggle that cleanup behavior?

ShaneHarvey · 2020-01-21T22:38:33Z

DELETE would also remove the log files. MO deletes the files anytime it shuts down a cluster:

head /var/folders/lm/b1r2f8p503xg40r6x2rqv7fr0000gp/T/mongo-n3d4Jr/mongod.log
2020-01-21T14:36:25.702-0800 I  CONTROL  [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'
2020-01-21T14:36:25.703-0800 W  ASIO     [main] No TransportLayer configured during NetworkInterface startup
2020-01-21T14:36:25.708-0800 W  ASIO     [main] No TransportLayer configured during NetworkInterface startup
2020-01-21T14:36:25.709-0800 D1 NETWORK  [main] fd limit hard:524288 soft:262144 max conn: 209715
2020-01-21T14:36:25.709-0800 I  CONTROL  [initandlisten] MongoDB starting : pid=35243 port=27017 dbpath=/var/folders/lm/b1r2f8p503xg40r6x2rqv7fr0000gp/T/mongo-n3d4Jr 64-bit host=Shanes-MacBook-Pro-2.local
2020-01-21T14:36:25.709-0800 I  CONTROL  [initandlisten] db version v4.3.2-594-g24a8f59
2020-01-21T14:36:25.709-0800 I  CONTROL  [initandlisten] git version: 24a8f59e21490f4dd101224ba01872d773c7f25e
2020-01-21T14:36:25.709-0800 I  CONTROL  [initandlisten] allocator: system
2020-01-21T14:36:25.709-0800 I  CONTROL  [initandlisten] modules: enterprise
2020-01-21T14:36:25.709-0800 I  CONTROL  [initandlisten] build environment:
$ curl -XDELETE http://localhost:8889/servers/e82cbabf-e6e8-406a-be80-2e8ee9c0fcac
$ head /var/folders/lm/b1r2f8p503xg40r6x2rqv7fr0000gp/T/mongo-n3d4Jr/mongod.log
head: /var/folders/lm/b1r2f8p503xg40r6x2rqv7fr0000gp/T/mongo-n3d4Jr/mongod.log: No such file or directory

ShaneHarvey · 2020-01-21T23:03:37Z

I think a simpler solution might be to copy the files you're interested in before uploading them. Would that avoid the tar failures you're seeing?

Well, given that, I don't think we can merge this patch, since it would seemingly break everyone's log storage

It wouldn't break Python's log upload task because we upload the logs before shutting down mongo-orchestration:

  - func: "upload mo artifacts"
  - func: "upload test results"
  - func: "stop mongo-orchestration"

mbroadst · 2020-01-22T14:01:44Z

@ShaneHarvey I think the whole problem is that those files might still be written to, and tar would error out because bytes were changing while it was trying to do its job. I think its reasonable that we try to find a more comprehensive solution to this problem.

patrickfreed · 2020-01-23T17:44:04Z

Here's a patch confirming what @mbroadst said. I think most drivers run "upload-mo-artifacts" in post (which suppresses failures), so they don't see these errors. I was hoping that the logs being append-only while upload-mo-artifacts was being run would prevent the issue, but that appears to not be the case.

ShaneHarvey · 2020-01-23T18:43:34Z

Yeah I think it's fine to not cleanup the files after shutting down the clusters in the cleanup_storage signal handler.

patrickfreed · 2020-01-24T00:02:32Z

So I discovered why the logs were apparently not being deleted on my machine:

In the code snippet you pasted above (cleanup_mprocess), it attempts to delete cfg["logPath"], but the log path is actually stored at cfg["logpath"], not in camel case (see here). By default, the log files are written to dbpath/mongod.log, so if you don't specify a log path it does usually get deleted. The configurations I use specify log paths, so the logs don't get deleted.

How about we make this behavior official and remove the attempted log path deletion from cleanup? The nice thing is since this won't have any behavioral changes (this bug has been in there for 7+ years), we know we won't break any evergreen configs. Then I won't need to modify anything else and can just use the blocking stop changes already made thus far in combination with a custom log path.

ShaneHarvey · 2020-01-24T00:43:01Z

SGTM. Can you open a follow-up issue to not delete the default logpath (dbpath/mongod.log) on shutdown too?

ShaneHarvey

LGTM

ShaneHarvey · 2020-01-28T00:28:54Z

One note. I've noticed that shutting down clusters can take a long time, especially with newer server versions. If that proves to be the case in Evergreen, we may need to optimize stop to shutdown the clusters forcefully with SIGTERM instead of using the shutdown command.

patrickfreed · 2020-01-30T16:19:45Z

I wasn't able to run the entire test suite locally due to similar issues found in #273, but I did run drivers-evergreen-tools and the swift driver's evergreen's with this patch and they passed fine. As you pointed out, the tasks that use latest sharded clusters can take up to 30s longer now that it waits for shutdown, but in my experience that is usually only a small percentage of the total time a task takes to run. Do you think that's sufficient testing or should I try some more drivers out?

ShaneHarvey · 2020-01-30T16:53:34Z

Yes I think that's fine. I opened #278 to make mongo-orchestration stop faster.

patrickfreed · 2020-01-30T18:14:54Z

It looks like I don't have write access to this repository, so could you squash/merge it for me?

patrickfreed added 2 commits January 21, 2020 15:34

stop waits for child mongo processes to terminate

7377ec3

reorder import

7a92eab

patrickfreed changed the title ~~Wait for child mongo processes to terminate in "stop"~~ Wait for daemon to fully terminate in "stop" Jan 21, 2020

ShaneHarvey reviewed Jan 21, 2020

View reviewed changes

remove psutil dep

3a041b3

ShaneHarvey reviewed Jan 21, 2020

View reviewed changes

remove incorrect logPath deletion

68ffa5b

patrickfreed mentioned this pull request Jan 24, 2020

Don't delete logs stored at default logpath on shutdown #277

Open

ShaneHarvey approved these changes Jan 28, 2020

View reviewed changes

ShaneHarvey mentioned this pull request Jan 30, 2020

Make mongo-orchestration stop faster #278

Closed

ShaneHarvey merged commit 24010b6 into mongodb-labs:master Jan 30, 2020

patrickfreed mentioned this pull request Feb 21, 2020

SWIFT-690 Don't archive logs until cluster has finished shutting down mongodb/mongo-swift-driver#383

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wait for daemon to fully terminate in "stop" #276

Wait for daemon to fully terminate in "stop" #276

patrickfreed commented Jan 21, 2020

ShaneHarvey Jan 21, 2020 •

edited

Loading

patrickfreed Jan 21, 2020

ShaneHarvey left a comment •

edited

Loading

patrickfreed commented Jan 21, 2020

ShaneHarvey commented Jan 21, 2020

patrickfreed commented Jan 21, 2020

patrickfreed commented Jan 21, 2020

ShaneHarvey commented Jan 21, 2020

ShaneHarvey commented Jan 21, 2020

mbroadst commented Jan 22, 2020

patrickfreed commented Jan 23, 2020 •

edited

Loading

ShaneHarvey commented Jan 23, 2020

patrickfreed commented Jan 24, 2020

ShaneHarvey commented Jan 24, 2020

ShaneHarvey left a comment

ShaneHarvey commented Jan 28, 2020

patrickfreed commented Jan 30, 2020

ShaneHarvey commented Jan 30, 2020

patrickfreed commented Jan 30, 2020

Wait for daemon to fully terminate in "stop" #276

Wait for daemon to fully terminate in "stop" #276

Conversation

patrickfreed commented Jan 21, 2020

ShaneHarvey Jan 21, 2020 • edited Loading

Choose a reason for hiding this comment

patrickfreed Jan 21, 2020

Choose a reason for hiding this comment

ShaneHarvey left a comment • edited Loading

Choose a reason for hiding this comment

patrickfreed commented Jan 21, 2020

ShaneHarvey commented Jan 21, 2020

patrickfreed commented Jan 21, 2020

patrickfreed commented Jan 21, 2020

ShaneHarvey commented Jan 21, 2020

ShaneHarvey commented Jan 21, 2020

mbroadst commented Jan 22, 2020

patrickfreed commented Jan 23, 2020 • edited Loading

ShaneHarvey commented Jan 23, 2020

patrickfreed commented Jan 24, 2020

ShaneHarvey commented Jan 24, 2020

ShaneHarvey left a comment

Choose a reason for hiding this comment

ShaneHarvey commented Jan 28, 2020

patrickfreed commented Jan 30, 2020

ShaneHarvey commented Jan 30, 2020

patrickfreed commented Jan 30, 2020

ShaneHarvey Jan 21, 2020 •

edited

Loading

ShaneHarvey left a comment •

edited

Loading

patrickfreed commented Jan 23, 2020 •

edited

Loading