Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Graylog + DataNode] None of the TrustManagers trust this certificate chain #20027

Closed
pavlo-tk opened this issue Jul 29, 2024 · 5 comments
Closed
Labels

Comments

@pavlo-tk
Copy link

Hi folks,

I'm asking for help or guidance on how to deal with the following problem, and if it's a bug, even better — it will be known.

I'm using Graylog and Graylog DanaNode in Docker environment with the compose.yaml file following the official example published here: https://github.com/Graylog2/docker-compose/blob/main/open-core/docker-compose.yml.

Everything works: provisioning SSL ertificates in the Preflight UI, survives complete docker compose down / docker compose up, everything works and preserved.

It's wokring well for some time, maybe weeks, can't tell exactly, and then one day suddenly Graylog can't communicate with the DataNode because of SSL issues. The logs from both containers are:

Graylog

INFO : org.graylog2.storage.versionprobe.VersionProbe - OpenSearch/Elasticsearch is not available. Retry #17
ERROR: org.graylog2.storage.versionprobe.VersionProbe - Unable to retrieve version from Elasticsearch node: None of the TrustManagers trust this certificate chain. - None of the TrustManagers trust this certificate chain.

DataNode

WARN [OpensearchNodeHeartbeat] Opensearch REST api of process 124 unavailable. Cause: None of the TrustManagers trust this certificate chain. See https://opensearch.org/docs/latest/clients/java-rest-high-level/ for troubleshooting.
INFO [OpensearchProcessImpl] [2024-07-29T17:21:57,206][ERROR][o.o.s.s.h.n.SecuritySSLNettyHttpServerTransport] [graylog-datanode] Exception during establishing a SSL connection: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown
INFO [OpensearchProcessImpl] javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown
... a lot of other traces/lines

I'm exhausted my options and research — can't quite find anything that can help. Disabling SSL between Graylog and DataNode will work for me too, but nothing I did worked so far. It's a breaking point when I either figure this out, or have to replace Graylog with something else.

Expected Behavior

Just keep working like it was before "something happens".

Your Environment

  • Graylog Version: 5.2.5 (graylog/graylog:5.2.5)
  • DataNode Version: 5.2.5 (graylog/graylog-datanode:5.2.5)
  • MongoDB Version: 7.0.6 (mongo:7.0.6)
  • Operating System: Docker on WSL 2 under Windows 11
@pavlo-tk pavlo-tk added the bug label Jul 29, 2024
@pavlo-tk pavlo-tk changed the title None of the TrustManagers trust this certificate chain [Graylog + DataNode] None of the TrustManagers trust this certificate chain Jul 29, 2024
@todvora
Copy link
Contributor

todvora commented Aug 1, 2024

Hello @pavlo-tk,
thank you for the bug report. I assume that your certificate generated for the datanode expired during the time when your containers were stopped. When you start them again, the certificate is expired, the datanode won't start and this also blocks the startup of the graylog server. Does it sound plausible?

We have fixed this situation recently and it will be a part of the next release.

You are running quite old versions and there has been a lot of progress around the Datanode lately. I'd suggest that you update to a newer version. If your setup is not a production and critical infrastructure, you can also try the latest alpha release: https://hub.docker.com/r/graylog/graylog-datanode/tags

As a hotfix, I would suggest setting insecure_startup=true in your datanode config file or GRAYLOG_DATANODE_INSECURE_STARTUP=true in your env (edited, not SETUP as stated originally but STARTUP instead). This will disable SSL for your datanode. Then you can adapt your renewal policy for certificates (maybe set some longer expiration if you take your containers down for longer periods of time) and renew your datanode certificate from the UI. Then you should be able to remove the insecure setup configuration and get back to SSL afterwards.

There is another workaround known, manipulating time of the machine: Graylog2/docker-compose#63 (comment), if this is something you want to try.

Sorry for the inconvenience and please let me know if this fixed your situation or there is anything else I can help you with.

Best regards,
Tomas

@pavlo-tk
Copy link
Author

pavlo-tk commented Aug 1, 2024

Hi @todvora,

Thank you for the tips! I appreciate the time you took to answer.
Your assumption about expired certificates during downtime is logical and makes the most sense.

It's nice to see that just yesterday you merged the potential fix. I will upgrade my setup as soon as there will be the next stable release with it in it.

In the meantime, I was excited to learn about insecure_startup. Even more so because the Graylog and DataNode are running on the same host, so there is not a high need in certificates. Howevver, after adding GRAYLOG_DATANODE_INSECURE_SETUP to the compose.yaml and docker compose down / docker compose up -d everything, nothing has changed: I see the same errors and Graylog doesn't start. So, GRAYLOG_DATANODE_INSECURE_SETUP didn't have any effect.

Below is part of my compose.yaml:

  graylog-datanode:
    container_name: graylog-datanode
    hostname: graylog-datanode
    image: graylog/graylog-datanode:5.2.5
    environment:
      GRAYLOG_DATANODE_HOSTNAME: graylog-datanode
      GRAYLOG_DATANODE_PASSWORD_SECRET: $GRAYLOG_PASSWORD_SECRET
      GRAYLOG_DATANODE_ROOT_PASSWORD_SHA2: $GRAYLOG_ROOT_PASSWORD_SHA2
      GRAYLOG_DATANODE_MONGODB_URI: "mongodb://mongo:27017/graylog"
      GRAYLOG_DATANODE_INSECURE_SETUP: true
    volumes:
      - graylog-datanode:/var/lib/graylog-datanode
    restart: on-failure

I really want this insecure_startup to work.

  1. Do you have any suggestions or an idea of what might be wrong here and why this setting didn't have effect?
  2. I would also try to set this setting in the DataNode config file directly: could you let me know it's location in the container? The Default File Locations documentation doesn't cover Docker or DataNode and the only datanode.conf I found in the container was at /etc/graylog/datanode/datanode.conf and it was empty.

Thank you Tomas, I appreciaty you.

@todvora
Copy link
Contributor

todvora commented Aug 5, 2024

Aaah, sorry, it's called GRAYLOG_DATANODE_INSECURE_STARTUP, my mistake. It's startup, not setup, tricky naming :-/ (I also corrected my original response, to prevent any further confusion to anyone reading this issue). Could you please try again with GRAYLOG_DATANODE_INSECURE_STARTUP ?

The /etc/graylog/datanode/datanode.conf is intentionally empty, as all the configuration is either default or passed as env properties. But you can mount your config file to that location and it should be used instead.

@pavlo-tk
Copy link
Author

pavlo-tk commented Aug 5, 2024

Hi Tomas,

What a relief! The GRAYLOG_DATANODE_INSECURE_STARTUP: true worked flawlessly, everything is up from where I left it. Now I know how to deal with this issue. I might leave the certificates out of the equation in production environment as well since they don't provide much value within the same machine.

Thank you for confirming about the datanode.conf.

Thank you, friend!

@pavlo-tk pavlo-tk closed this as completed Aug 5, 2024
@todvora
Copy link
Contributor

todvora commented Aug 6, 2024

Nice to hear that everything worked. I am glad I could help :-)

Best regards,
Tomas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants