-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add alternative mechanism to signals for stopping workload executors #77
Comments
Note that this issue is blocking .NET integration. @vincentkam to add a code example that reproduces the issue we are seeing with signals on Windows + .NET. |
@prashantmital note that using a tombstone file will not work if the workload executor is implemented as a docker container. In such scenarios, the container would not have access to that file unless it were explicitly linked into the container, and may yet still cause trouble. Another approach might be to use a lightweight RPC over domain sockets/named pipes, or to use something like a value in a mongodb document. I still think the path of least complexity lays with using system signals, so I'm eager to see @vincentkam 's code exemplifying the difficulty with trapping these signals on windows. |
I haven't delved too deeply yet, but I know that on the JVM there is no straightforward way to install a signal handler. There is also no JVM support for domain sockets. |
The following sample exemplifies the issues we've been running into in getting signal handling to work with a combination of cygwin bash + windows python + dotnet. The workload-executor is a cygwin bash script adapted from the python driver's bash script. It in turns executes the "native" workload executor, which in this case, is basically I've commented Program.cs to illustrate the flow a bit better as I know not everyone may not have a Windows box handy, although a spawnhost with the The TLDR is that it appears that something is terminating the native workload executor before the it can finish executing. I suspect it's a Cygwin bash problem because I see similar behavior when using Here is a sample test run using Cygwin bash to invoke
|
See #79 for a proposed alternate strategy for communicating state between astrolabe and workload-executors. |
The current design for this project uses SIGINT (on *nix) and CTRL_BREAK_EVENT (on Windows) to coordinate the shutdown of the workload executor process after maintenance has been successfully run on the Atlas cluster.
Driver authors have to rely on standard APIs provided by their language in order to write a workload executor to conform to this spec. In practice, this has proven to be easier said than done. To reduce implementation complexity, we should consider providing an alternative mechanism to signals - something that is easier to implement and more platform-independent. An obvious solution would be to have
astrolabe
write a tombstone file to a pre-determined location when maintenance has completed, having workload executors periodically check for the existence of this file, and having them terminate when the file is eventually found.The text was updated successfully, but these errors were encountered: