Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Elastic Agent] Provide a better mechanism to update the Certificates used by Agent #4557

Open
2 tasks
nimarezainia opened this issue Apr 10, 2024 · 15 comments
Open
2 tasks
Labels
Team:Elastic-Agent Label for the Agent team

Comments

@nimarezainia
Copy link
Contributor

nimarezainia commented Apr 10, 2024

Describe the enhancement:

During the life-cycle of a deployment the certificates used by the agent to establish TLS connections will inevitably expire and new ones need to be used. This issue is to discuss the best approach in providing support for this, and describing how a user would go about recycling the certificates they have on all their agents.

This may involve:

  • what actions need to take place on the agent to accept the new certificate?
  • What changes in Fleet so that the user may initiate usage of the new certs
@pierrehilbert pierrehilbert added the Team:Elastic-Agent Label for the Agent team label Apr 10, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@cmacknz
Copy link
Member

cmacknz commented Apr 10, 2024

A few thoughts on how we could accomplish this:

In the Fleet UI you can only change the Fleet server hosts I believe, I don’t think we expose the TLS settings. So there’s no way to push new updates to agent remotely at all for TLS, you have to do it from the command line. We need to fix this.

Screenshot 2024-04-10 at 5 06 37 PM

As long as the certificates that changed aren’t in the system certifcate store I think we can reload on the fly by just breaking the connection to fleet and recreating it per golang/go#35887

Also per golang/go#35887 if the change the user needs to make is to the system certificate store then we need to restart, so being able to remotely restart agents would help here. I think this would also cover case 2 and reloading without restarting just becomes an optimization.

@strawgate strawgate changed the title [Elastic Agent] Provide a better mechanism to update the Certifcates used by Agent [Elastic Agent] Provide a better mechanism to update the Certificates used by Agent Apr 11, 2024
@strawgate
Copy link
Contributor

strawgate commented Apr 11, 2024

Just to confirm, deployment the certificates used by the agent to establish TLS connections is this request related to client verification (mutual auth) -- i.e. a private key that exists on the client that Fleet creates and occasionally needs to be recreated? Or is it related to server verification -- updating the CA certification that Agent uses to verify that it's talking to a trusted Fleet server?

I assume it's just updating the CA agent uses to verify Fleet but just making sure

@cmacknz
Copy link
Member

cmacknz commented Apr 11, 2024

It is updating the CA agent uses to verify it is talking to a trusted Fleet server.

See elastic/ingest-docs#167

@strawgate
Copy link
Contributor

Ok yeah, sounds like we need to allow Fleet to update allowed CAs via Policy with a big warning message that says you can quite easily blow up your entire environment doing so?

@cmacknz
Copy link
Member

cmacknz commented Apr 11, 2024

you can quite easily blow up your entire environment doing so?

The approach we have taken with similar things like updating proxy settings and the soon to be supported mTLS settings is to have the agent test that it can still reach Fleet server before committing and making the configuration change permanent. Hopefully we can do something similar with the CA assuming both are usable at the time the switch is made.

@nimarezainia
Copy link
Contributor Author

@nimarezainia
Copy link
Contributor Author

@cmacknz @strawgate question: if the CA is changed in this manner, does it affect the already established connections? isn't this CA used only during handshake? there's a mention o agent restarts which says otherwise.

@strawgate
Copy link
Contributor

there's a mention o agent restarts which says otherwise

It sounds like the Agent loads certificates from the system store on startup.

This causes an edge case where the Administrator has recently loaded a new CA into the system store and the Agent loads a policy where the user is trying to use that certificate from the store.

The agent won't see the new certificate in the store unless it was restarted between the time the administrator added the certificate to the store and when it loads the policy that's pointing at the certificate.

If, upon updating the certificate used in the fleet policy, the Agent breaks its connection to Fleet and starts a new one (per @cmacknz's comment above) the Agent would attempt to start a new connection but would find the CA cert it's supposed to use is not in its in memory cache of the system store.

@cmacknz does this match your understanding?

@cmacknz
Copy link
Member

cmacknz commented Jun 7, 2024

It sounds like the Agent loads certificates from the system store on startup.

Unless custom CAs were configured, then we only load those. See discussion in https://github.com/elastic/ingest-dev/issues/3424 about changing this, or making it configurable.

If, upon updating the certificate used in the fleet policy, the Agent breaks its connection to Fleet and starts a new one (per @cmacknz's comment above) the Agent would attempt to start a new connection but would find the CA cert it's supposed to use is not in its in memory cache of the system store.

If, upon updating the certificate used in the fleet policy, the Agent breaks its connection to Fleet and starts a new one (per @cmacknz's comment above) the Agent would attempt to start a new connection but would find the CA cert it's supposed to use is not in its in memory cache of the system store.

@cmacknz does this match your understanding?

It matches what I expect to happen based on reading I've done, but this all depends on implementation details of the Go TLS implementation. The exact behavior will be easiest to confirm via testing.

@nimarezainia
Copy link
Contributor Author

in context of https://github.com/elastic/ingest-dev/issues/3443 , (where we want to provide the UI for the user to easy swap these CA's certs), having a UI is somewhat superfluous unless changing it causes a reset of the connection at the agent that receives this new CA.

Further, we have this same option on the Elasticsearch output of *beats, see: https://www.elastic.co/guide/en/beats/filebeat/current/securing-communication-elasticsearch.html :

output.elasticsearch:
hosts: ["https://myEShost:9200"]
ssl.certificate_authorities:
- /etc/pki/my_root_ca.pem
- /etc/pki/my_other_ca.pem
ssl.certificate: "/etc/pki/client.pem"
ssl.key: "/etc/pki/key.pem"

Here user can configure a different CA. There's no mention of resetting or restarting filebeat (perhaps we need to also address this doc section). The same yaml can be applied in the "Advanced yaml" section of the Elasticsearch output I believe. So we need to definitely confirm if a restart is required.

My preference is to reload without needing a restart

fyi @AndersonQ as you have been looking into this area lately.

@nimarezainia
Copy link
Contributor Author

For now i think we should have an issue for testing these theories here to determine what level of work is required to fix this. We can in parallel pursue the UI efforts.

@nimarezainia
Copy link
Contributor Author

@belimawr did mention this on a thread: https://github.com/elastic/beats/blob/3102b496b9e9f0eae8c7eb685b1217734d40190b/filebeat/filebeat.reference.yml#L1710-L1716 - that beats will restart if the certificates change

@cmacknz
Copy link
Member

cmacknz commented Jun 11, 2024

Agent restarts Beats (not all inputs, just Beats) automatically when any output parameter changes because of a historical bug in output hot reloading that hasn't been investigated/resolved.

Ideally we wouldn't do this.

@belimawr
Copy link
Contributor

@belimawr did mention this on a thread: https://github.com/elastic/beats/blob/3102b496b9e9f0eae8c7eb685b1217734d40190b/filebeat/filebeat.reference.yml#L1710-L1716 - that beats will restart if the certificates change

This option is disabled by default.

As Craig said any change on output parameters will restart a Beat. It's a implementation detail, but the Beat restarts itself, the Elastic-Agent just sends the new config and the Beat decides what to do.

This behaviour is configurable (https://github.com/elastic/beats/blob/3c9f4d952bfd20b1898cfeb59916a2239b667988/x-pack/agentbeat/agentbeat.spec.yml#L74-L75), as it is required for the Beat to function correctly, it is always enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

No branches or pull requests

6 participants