-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Elastic Agent] Provide a better mechanism to update the Certificates used by Agent #4557
Comments
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
A few thoughts on how we could accomplish this: In the Fleet UI you can only change the Fleet server hosts I believe, I don’t think we expose the TLS settings. So there’s no way to push new updates to agent remotely at all for TLS, you have to do it from the command line. We need to fix this. As long as the certificates that changed aren’t in the system certifcate store I think we can reload on the fly by just breaking the connection to fleet and recreating it per golang/go#35887 Also per golang/go#35887 if the change the user needs to make is to the system certificate store then we need to restart, so being able to remotely restart agents would help here. I think this would also cover case 2 and reloading without restarting just becomes an optimization. |
Just to confirm, I assume it's just updating the CA agent uses to verify Fleet but just making sure |
It is updating the CA agent uses to verify it is talking to a trusted Fleet server. |
Ok yeah, sounds like we need to allow Fleet to update allowed CAs via Policy with a big warning message that says you can quite easily blow up your entire environment doing so? |
The approach we have taken with similar things like updating proxy settings and the soon to be supported mTLS settings is to have the agent test that it can still reach Fleet server before committing and making the configuration change permanent. Hopefully we can do something similar with the CA assuming both are usable at the time the switch is made. |
@cmacknz @strawgate question: if the CA is changed in this manner, does it affect the already established connections? isn't this CA used only during handshake? there's a mention o agent restarts which says otherwise. |
It sounds like the Agent loads certificates from the system store on startup. This causes an edge case where the Administrator has recently loaded a new CA into the system store and the Agent loads a policy where the user is trying to use that certificate from the store. The agent won't see the new certificate in the store unless it was restarted between the time the administrator added the certificate to the store and when it loads the policy that's pointing at the certificate. If, upon updating the certificate used in the fleet policy, the Agent breaks its connection to Fleet and starts a new one (per @cmacknz's comment above) the Agent would attempt to start a new connection but would find the CA cert it's supposed to use is not in its in memory cache of the system store. @cmacknz does this match your understanding? |
Unless custom CAs were configured, then we only load those. See discussion in https://github.com/elastic/ingest-dev/issues/3424 about changing this, or making it configurable.
It matches what I expect to happen based on reading I've done, but this all depends on implementation details of the Go TLS implementation. The exact behavior will be easiest to confirm via testing. |
in context of https://github.com/elastic/ingest-dev/issues/3443 , (where we want to provide the UI for the user to easy swap these CA's certs), having a UI is somewhat superfluous unless changing it causes a reset of the connection at the agent that receives this new CA. Further, we have this same option on the Elasticsearch output of *beats, see: https://www.elastic.co/guide/en/beats/filebeat/current/securing-communication-elasticsearch.html :
Here user can configure a different CA. There's no mention of resetting or restarting filebeat (perhaps we need to also address this doc section). The same yaml can be applied in the "Advanced yaml" section of the Elasticsearch output I believe. So we need to definitely confirm if a restart is required. My preference is to reload without needing a restart fyi @AndersonQ as you have been looking into this area lately. |
For now i think we should have an issue for testing these theories here to determine what level of work is required to fix this. We can in parallel pursue the UI efforts. |
@belimawr did mention this on a thread: https://github.com/elastic/beats/blob/3102b496b9e9f0eae8c7eb685b1217734d40190b/filebeat/filebeat.reference.yml#L1710-L1716 - that beats will restart if the certificates change |
Agent restarts Beats (not all inputs, just Beats) automatically when any output parameter changes because of a historical bug in output hot reloading that hasn't been investigated/resolved. Ideally we wouldn't do this. |
This option is disabled by default. As Craig said any change on output parameters will restart a Beat. It's a implementation detail, but the Beat restarts itself, the Elastic-Agent just sends the new config and the Beat decides what to do. This behaviour is configurable (https://github.com/elastic/beats/blob/3c9f4d952bfd20b1898cfeb59916a2239b667988/x-pack/agentbeat/agentbeat.spec.yml#L74-L75), as it is required for the Beat to function correctly, it is always enabled. |
Describe the enhancement:
During the life-cycle of a deployment the certificates used by the agent to establish TLS connections will inevitably expire and new ones need to be used. This issue is to discuss the best approach in providing support for this, and describing how a user would go about recycling the certificates they have on all their agents.
This may involve:
The text was updated successfully, but these errors were encountered: