-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: reload TLS certificate without restarting server #5516
Conversation
public class FileWatcher implements Runnable { | ||
|
||
private static final Logger log = LoggerFactory.getLogger(FileWatcher.class); | ||
private static final ExecutorService executor = Executors.newFixedThreadPool(1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general design principle best not to rely on statics as they pollute the global namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also you probably don't need a custom executor at all - you can use a Vert.x worker thread for this.
try { | ||
FileWatcher.onFileChange( | ||
watchLocation, | ||
() -> serverVerticles.forEach(ServerVerticle::restartServer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is more complex than it needs to be. Instead of individually starting and stopping each http server in each ServerVerticle I think it would be simpler just to add a restart() method to Server which simply stopped the Server and started it again, e.g.:
public void restart() {
stop();
start();
}
I haven't tried this, but seems like it should work (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not work. (I had this initially.) As explained in the PR description:
It's necessary to recreate rather than simply restart the servers because Vert.x only loads TLS certs at the time that the KeyStoreHelper is created, which is only when servers are created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm referring to Server.start/stop not ServerVerticle.start/stop.
If you do that, it will take care of creating new Http Servers for you, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I understand your suggestion now. We'll still need custom logic to ensure the HTTP servers are restarted with the same ports, in the event that port 0 is specified (as is the case in our tests). It's not clear to me that this is actually simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a commit where I've made your suggested change: vcrfxia@612b215
The new test, TlsTest#shouldReloadCert()
fails because the server ports changed during the restart but the client still tries to connect to the old port.
If we're OK with the ports changing on restart, we can update the test to recreate the client after the restart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline. The only use case for setting port 0 at the moment is in our tests, so we're not going to worry about preserving ports across restarts. As such, I've moved the restart logic from ServerVerticle to Server as suggested, and updated the test to re-create clients in order to use the most up-to-date port.
@@ -99,10 +100,39 @@ public void stop(final Promise<Void> stopPromise) { | |||
} | |||
} | |||
|
|||
// Creates a new server, rather than simply stopping and restarting, in order to reload TLS certs | |||
public void restartServer() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As previous comment, seems overcomplex - just stopped and starting the Server should do the trick?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above -- I initially misunderstood the suggestion. I've made the switch now.
Generally looks good! I think it can be made a little simpler though :) |
Hey @purplefox , now that I've moved the restart logic from ServerVerticle to Server, the newly added test is failing to stop the Server with:
which is weird since start() and stop() are both synchronized, and start() waits for the futures to complete. Seems like we might be hitting eclipse-vertx/vert.x#2049, which hasn't had updates since 2017. Can you spot my mistake? |
Hey @vcrfxia , I had a quick look at the failing test. What I think is happening here is: The undeploy error is a bit of a red herring - this actually happens when the server is closed in the tearDown of the test, not in the restart, and I think it happens because the stop() method in the restart fails because the thread waiting for the futures to complete is interrupted. If you clear the deploymentIds in a finally you will see what I mean, and see the real exception. I suspect the thread is being interrupted (haven't checked this), because a Vert.x worker thread is being used to run the file watcher so Server.stop() is actually being called from a Vert.x worker thread, and the stop() method also shuts down the worker executor! If you change the file watcher to run on a different thread then I'm guessing it will fix it (my bad for suggesting to use a worker thread for this before!). Also, as an aside, we should make sure the file watcher gets shuts down when the rest app is closed. |
Thanks @purplefox , that explanation makes a lot of sense. The test is passing on the latest iteration now, though my approach for allowing the file watcher to restart the server without restarting the file watcher feels rather hacky. Let me know if you've got better ideas. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks fine - if you're in a rush to get this merged I'd say it's ok to merge as is and follow up changes in a different PR, up to you.
public synchronized void shutdown() { | ||
shutdown = true; | ||
log.info("Stopped watching for TLS cert changes."); | ||
if (thread != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any need to interrupt the thread here? If shutdown() is called from this same thread then you set the shutdown flag which means the run() method will return exit ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the redundancy as extra insurance that we wouldn't leak resources (in case something changes with the FileWatcher code and shutdown() is no longer foolproof) but I think that was overkill. I've removed the interrupt() and simplified the Server code as you suggested.
start(true); | ||
} | ||
|
||
private synchronized void start(final boolean startFileWatcher) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the startFileWatcher flag is necessary if you don't interrupt the watcher thread (see previous comment). This means you can simplify things and have a simple start/stop method as before and always create a new filewatcher in start().
stop(true); | ||
} | ||
|
||
private synchronized void stop(final boolean stopFileWatcher) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, parameter is unnecessary
*/ | ||
public synchronized void start() { | ||
log.info("Starting file watcher to watch for changes: " + file); | ||
thread = new Thread(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can avoid this if FileWatcher extends Thread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, this is a lot better. Thanks for the suggestion! I thought what I was doing didn't feel right, heh :)
/** | ||
* Stops watching the file and closes the watch service | ||
*/ | ||
public synchronized void shutdown() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't need to be synchronized if we don't have a thread member and shutdown is volatile
try { | ||
callback.run(); | ||
} catch (Exception e) { | ||
log.warn("Hit error callback on file change", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should log at error unless this is expected in normal operation?
); | ||
} finally { // restore cert regardless of failure above so as to not affect other tests | ||
// When: load valid store | ||
ServerKeyStore.loadValidServerKeyStore(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to test reloading the server more than once - i.e. changing the cert more than once, as there might be a bug in the filewatcher/reloading that only manifests after the reload has been done once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test already reloads the server twice: once with a bad cert, and then again with a good cert. Are you suggesting we should repeat the cycle twice, for a total of four reloads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, if you're already reloading more than once that seems fine :)
/** | ||
* Closes the file watcher | ||
*/ | ||
public void shutdown() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @purplefox , now that FileWatcher
extends Thread
, do you think it makes sense to get rid of this shutdown()
method and instead call interrupt()
to terminate the file watcher? I'm wondering whether it's canonical to have a "clean" shutdown method like this, or if interrupt()
is understood to be used for that purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging PR for now -- will make this improvement in a follow-up (if we think it'd be an improvement).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's preferable to shut cleanly without interrupting threads if you can. Interrupting threads is only necessary if the thread is blocked on something that is interruptable. In our case we only call shutdown() from the thread itself so we can guarantee it's not blocked so there's no need to interrupt anything.
If the method was intended to be used elsewhere from a different thread then we should call Thread.interrupt() but that's not the case here.
As a sanity check you could add this in shutdown() to make sure that is the case
if (this != Thread.currentThread()) {
throw new IllegalStateException("Can only be called from file watcher thread");
}
Description
Implements the equivalent functionality the ksqlDB server used to inherit from rest-utils (confluentinc/rest-utils#156) in Vert.x. When a change to the specified watch location is detected, the Vert.x HttpServers are recreated in order to reload the certificates. It's necessary to recreate rather than simply restart the servers because Vert.x only loads TLS certs at the time that the KeyStoreHelper is created, which is only when servers are created.
This PR also ports over the TRUST_STORE_TYPE and KEY_STORE_TYPE configs which were missed in the rest-utils -> Vert.x migration.
Testing done
Unit + manual tests.
Reviewer checklist