Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent stays in 'updating' on self-managed cluster (works when using cloud for stack) #24274

Closed
EricDavisX opened this issue Mar 1, 2021 · 18 comments
Assignees
Labels
Team:Elastic-Agent Label for the Agent team Team:Fleet Label for the Fleet team

Comments

@EricDavisX
Copy link
Contributor

EricDavisX commented Mar 1, 2021

Using a self-managed 7.11.1 environment a fellow Elastician ( Xiaoguo Liu ) reports he is seeing that the Agent stays in 'updating' and is not sending documents to ES (no logs)

He is using this article as his basis for setup:
https://newtonpaul.com/how-to-install-elastic-siem-and-elastic-edr/

most notably the steps include usage of a self generated certificate for it.

He notes that he tried setting up Elastic on cloud and it worked on his host, so we know the host is ok (or was at one point).

During some slack-based discussion, he notes "I have checked very carefully according to the link https://www.elastic.co/guide/en/fleet/current/fleet-troubleshooting.html. The settings for Elasticsearch and Kibana should be aright."

And is seeing this in the logs on the host:

2021-02-25T17:55:49.358+0800	INFO	warn/warn.go:18	The Elastic Agent is currently in BETA and should not be used in production
2021-02-25T17:55:49.359+0800	INFO	application/application.go:59	Detecting execution mode
2021-02-25T17:55:49.359+0800	INFO	application/application.go:72	Agent is managed by Fleet
2021-02-25T17:55:49.523+0800	INFO	[composable]	composable/controller.go:44	EXPERIMENTAL - Inputs with variables are currently experimental and should not be used in production
2021-02-25T17:55:49.544+0800	INFO	[composable.providers.docker]	docker/docker.go:43	Docker provider skipped, unable to connect: protocol not available
2021-02-25T17:55:49.553+0800	INFO	[api]	api/server.go:62	Starting stats endpoint
2021-02-25T17:55:49.553+0800	INFO	application/managed_mode.go:294	Agent is starting
2021-02-25T17:55:49.553+0800	WARN	application/managed_mode.go:301	failed to ack update open C:\Program Files\Elastic\Agent\data\.update-marker: The system cannot find the file specified.
2021-02-25T17:55:49.553+0800	INFO	[api]	api/server.go:64	Metrics endpoint listening on: \\.\pipe\elastic-agent (configured: npipe:///elastic-agent)
2021-02-25T17:55:49.609+0800	ERROR	application/fleet_gateway.go:187	Could not communicate with Checking API will retry, error: fail to checkin to fleet: Post “https://192.168.0.4:5601/api/fleet/agents/a43635f0-774f-11eb-a006-83439a24417e/checkin?“: x509: certificate signed by unknown authority
2021-03-01T22:06:13.749+0800	ERROR	application/fleet_gateway.go:187	Could not communicate with Checking API will retry, error: fail to checkin to fleet: Post “https://192.168.0.4:5601/api/fleet/agents/a43635f0-774f-11eb-a006-83439a24417e/checkin?“: x509: certificate signed by unknown authority
2021-03-01T22:08:54.151+0800	ERROR	application/fleet_gateway.go:187	Could not communicate with Checking API will retry, error: fail to checkin to fleet: Post “https://192.168.0.4:5601/api/fleet/agents/a43635f0-774f-11eb-a006-83439a24417e/checkin?“: x509: certificate signed by unknown authority

More details from chat:
Using Windows as the host.

Using the Default policy, and seeing this, before switching it to a policy with Endpoint

The problem happened when I applied to my self managed cluster which ran on my Mac OS machine. I have enabled https for my cluster. In the past, I did it without enabling https, and it was successful. Both Mac and Ubuntu machines are in the same LAN network. They could see each other. I am not sure whether this was due to a self-signed certificate.

==============================
elasticsearch.yml
discovery.type: single-node
xpack.security.enabled: true
xpack.security.authc.api_key.enabled: true

Transport layer

xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.key: /etc/elasticsearch/certs/elasticsearch.key
xpack.security.transport.ssl.certificate: /etc/elasticsearch/certs/elasticsearch.crt
xpack.security.transport.ssl.certificate_authorities: [ “/etc/elasticsearch/certs/ca/ca.crt” ]

HTTP layer

xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.verification_mode: certificate
xpack.security.http.ssl.key: /etc/elasticsearch/certs/elasticsearch.key
xpack.security.http.ssl.certificate: /etc/elasticsearch/certs/elasticsearch.crt
xpack.security.http.ssl.certificate_authorities: [ “/etc/elasticsearch/certs/ca/ca.crt” ]

==============================
kibana.yml
elasticsearch.hosts: [“https://192.168.0.4:9200”]
elasticsearch.ssl.certificateAuthorities: [“/etc/kibana/certs/ca/ca.crt”]
elasticsearch.ssl.certificate: “/etc/kibana/certs/kibana.crt”
elasticsearch.ssl.key: “/etc/kibana/certs/kibana.key”

These settings enable SSL for outgoing requests from the Kibana server to the browser.

server.ssl.enabled: true
server.ssl.certificate: “/etc/kibana/certs/kibana.crt”
server.ssl.key: “/etc/kibana/certs/kibana.key”
xpack.fleet.enabled: true
xpack.security.enabled: true
xpack.fleet.agents.tlsCheckDisabled: true
xpack.encryptedSavedObjects.encryptionKey: “something_at_least_32_characters”
elasticsearch.username: “elastic”
elasticsearch.password: “password” (edited)

images:

no-logs-in-es

status-is-updating-stuck

@EricDavisX EricDavisX added Team:Elastic-Agent Label for the Agent team Team:Fleet Label for the Fleet team labels Mar 1, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/agent (Team:Agent)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/fleet (Team:Fleet)

@EricDavisX
Copy link
Contributor Author

@ph can we follow up? it may be something we want to fix for 7.12

@EricDavisX
Copy link
Contributor Author

I want to check if this is actually one of our known issues/ duplicates, actually

@EricDavisX
Copy link
Contributor Author

@liu-xiao-guo hi - @michalpristas was asking you if...

the output definition changed to contain cert definition in fleet?

in fleet app theres under settings a section called Elasticsearch output configuration this is the place where ssl config should be place. this is then forwarded to output

if you have some logs it would be of great help. as agent state should not be related to output definition so maybe we see 2 issues here [and we need to log the second one separately)

@liu-xiao-guo
Copy link

Is there any place talking on how to set it up in the place. I do not know what should be the correct format for filling it up. I have seen the closest discussion at elastic/kibana#73483. However, I still do not have the clue on its format:
image

@liu-xiao-guo
Copy link

liu-xiao-guo commented Mar 4, 2021

Thank Eric for his help. I followed the issue at elastic/kibana#75913. I got the same error as before.
image

ssl.certificate_authorities: ["C:\beats\ca.crt"]
certificate_authorities:

  • |
    -----BEGIN CERTIFICATE-----
    MIIDSjCCAjKgAwIBAgIVAONMH6yLC9bqur0ln5yb83oZihn/MA0GCSqGSIb3DQEB
    CwUAMDQxMjAwBgNVBAMTKUVsYXN0aWMgQ2VydGlmaWNhdGUgVG9vbCBBdXRvZ2Vu
    ZXJhdGVkIENBMB4XDTIxMDIyNDE0MTYxN1oXDTI0MDIyNDE0MTYxN1owNDEyMDAG
    A1UEAxMpRWxhc3RpYyBDZXJ0aWZpY2F0ZSBUb29sIEF1dG9nZW5lcmF0ZWQgQ0Ew
    ggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCWK9SBt3REq4Pt5ALwJMHj
    dmzS6KSfa5nK8X1RCsy6nAIXtp08/jx/annKi+RV4qrhTTW9JU9nuQGnvgKrEd9+
    8OPBwbikygeWtEBYXfCduo/83mg96QLQ0zfVBECfqmYheqgqiPHERxHs8v6S3QsL
    QPHq5k0Dak9aQ8zhNakSMalnvHmhJ7mASl00+Sc0NW+VsB2yHKrzCnOUdpr2mSLy
    RckAlxJpgm/+sz//48XJU97ibpQcMzEwzxMYmoiS+x3Bb/2TieNI2XnZ6q/lFX6t
    AgFHdPvWJqdchA6JCzfht7sEOdHISOBhE4/Up8apcGoiWgqOOKq41AJ+Eh+UVJO3
    AgMBAAGjUzBRMB0GA1UdDgQWBBRDJ+CRrKs4yjEwEeTU+5OsqUw7fTAfBgNVHSME
    GDAWgBRDJ+CRrKs4yjEwEeTU+5OsqUw7fTAPBgNVHRMBAf8EBTADAQH/MA0GCSqG
    SIb3DQEBCwUAA4IBAQBpKKeqxUD7co1IBfbjkujzRi9PNEVVs+MTQGg2ejPEb2V5
    /1VhPdjiO7OyMRUst2OgLpby6BF8OytCQU1MMJRJbXBzo8QWP0uWjsQTJzW8ol4G
    LaPlxFH/ZTJRJ8eDd2dW4d7mQznKC+L4PgYs6qGdf+TnpvXv14wWJjE5wpQLxvmn
    RmI6hd/kKm7usgcuKCNRhEPdntjK1SkAYY745P7o1DL9pJsX7mjRyC66ZJ/6CG0L
    ZzqUlrLI1kA8k/AxLXcM4JQJ0KIDokbKxethDiNGhBYUA/PgCNwG5XzBf2f+Zx2o
    1kJkzczG9nJb5fbZ3Y7H2w9TcmYh7lygt4QVIOWU
    -----END CERTIFICATE-----

Error log found at C:\Program Files\Elastic\Agent\elastic-agent.log
2021-03-04T11:37:05.333+0800 INFO warn/warn.go:18 The Elastic Agent is currently in BETA and should not be used in production
2021-03-04T11:37:05.334+0800 INFO application/application.go:59 Detecting execution mode
2021-03-04T11:37:05.334+0800 INFO application/application.go:72 Agent is managed by Fleet
2021-03-04T11:37:05.425+0800 INFO [composable] composable/controller.go:44 EXPERIMENTAL - Inputs with variables are currently experimental and should not be used in production
2021-03-04T11:37:05.430+0800 INFO [composable.providers.docker] docker/docker.go:43 Docker provider skipped, unable to connect: protocol not available
2021-03-04T11:37:05.435+0800 INFO [api] api/server.go:62 Starting stats endpoint
2021-03-04T11:37:05.435+0800 INFO application/managed_mode.go:294 Agent is starting
2021-03-04T11:37:05.435+0800 WARN application/managed_mode.go:301 failed to ack update open C:\Program Files\Elastic\Agent\data.update-marker: The system cannot find the file specified.
2021-03-04T11:37:05.435+0800 INFO [api] api/server.go:64 Metrics endpoint listening on: \.\pipe\elastic-agent (configured: npipe:///elastic-agent)
2021-03-04T11:37:05.675+0800 ERROR application/fleet_gateway.go:187 Could not communicate with Checking API will retry, error: fail to checkin to fleet: Post "https://192.168.0.4:5601/api/fleet/agents/e312ccc0-7c9a-11eb-8e58-dfec53b6045a/checkin?": x509: certificate signed by unknown authority

@EricDavisX
Copy link
Contributor Author

I'm just now thinking - did we change to new a Go library version maybe, and it is causing this?

I see this fix here from @kvch #23661 - it was not backported to 7.11, but I don't know for sure that we changed the Go version there, trying to track records.

@kvch
Copy link
Contributor

kvch commented Mar 5, 2021

In 7.11 we use Go 1.14.12, so Agent and Beats are still lenient when it comes to DNSNames.

@liu-xiao-guo
Copy link

If we compile a Go application, it actually runs the binary. It should not be related to runtime environment. There is no Go runtime environment.

@blakerouse
Copy link
Contributor

@liu-xiao-guo Setting the CA in that YAML block in Kibana is only for elasticsearch output. Based on the logs from the issue description you are having communication issues with Agent talking to Kibana, not Agent talking to elasticsearch (at least doesn't seem it can even make it that far; you might still have an issue there).

So I we need to focus on the communication of Agent with Kibana. You cannot set the CA to that from the UI. That can only be set during the enroll command. Can you provide the enrollment command you are using so I can start there?

@blakerouse blakerouse self-assigned this Mar 8, 2021
@liu-xiao-guo
Copy link

@blakerouse Thanks for your reply. My configuration is like:
image

and my command for connecting kibana is:
.\elastic-agent.exe install -f --kibana-url=https://192.168.0.4:5601 --enrollment-token=cjJaSzEzY0J6cjB1bzA2VElrbGc6RDhKUzRMRDNSb0MtZXBvOU9ZcVZsZw==

By the way, I have installed the self-created certificate into Windows according to the article at https://newtonpaul.com/how-to-install-elastic-siem-and-elastic-edr/

@blakerouse
Copy link
Contributor

Okay I can confirm that this does work with custom CA and certs. I have been able to get Elastic Agent running with self-signed certificates and custom CA.

I did hit a few issues that would improve this and need fixing, but nothing stops this from actually working.

#24484
#24485

Going to close this issue as I have it fully working.

@EricDavisX
Copy link
Contributor Author

Thanks so much @blakerouse . @liu-xiao-guo if you have problems we can research where it isn't documented well enough and add tickets to elastic/observability-docs (if not logged already) to improve it.

@EricDavisX
Copy link
Contributor Author

@blakerouse I'm doing postmortem review over issues and wanted to ask. @ph @ruflin too, is this a case we can detect better and throw more helpful logs or not? I have it open on my end to check docs to see what can be improved there too.

@blakerouse
Copy link
Contributor

@EricDavisX It is not something we can really detect from the other-side being that the connection is not being established because something is wrong in the configuration.

When I was testing this out, I did a few things wrong and the error messages that either elasticsearch, Kibana, or Elastic Agent returned where rather clear on how it was wrong. We do propagate the lower TLS errors out to log files in the cases that it fails.

There is always room for improvement, so maybe I didn't hit a case that wasn't clear. Please file bugs for those so we can provide a good UX in these situations.

@blakerouse
Copy link
Contributor

This issue on Windows is the article that is linked https://newtonpaul.com/how-to-install-elastic-siem-and-elastic-edr/ only installs the certificate in the Current User scope. The certificate needs to be installed for the entire local machine, so that when the Elastic Agent is running as a service (aka. SYSTEM user) it also has access to the CA.

@blakerouse
Copy link
Contributor

Installation of the certificate on Windows should be done with certlm.msc so the CA is installed at a local machine level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent Label for the Agent team Team:Fleet Label for the Fleet team
Projects
None yet
Development

No branches or pull requests

5 participants