Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRITICAL: Hitch crashed production server because of one faulty certificate pem file #369

Open
dgaastra opened this issue Mar 17, 2022 · 6 comments
Assignees

Comments

@dgaastra
Copy link

Expected Behavior

Expected Hitch to just ignore the faulty pem certificate and run happily.

Current Behavior

Mar 17 12:46:36 web2 hitch[2813]: 20220317T124636.810693 [ 2813] {core} hitch 1.6.1 starting
Mar 17 12:46:36 web2 hitch[2813]: 20220317T124636.812323 [ 2813] {core} Loading certificate pem files (11)
Mar 17 12:46:36 web2 systemd[1]: hitch.service: Main process exited, code=exited, status=1/FAILURE
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ An ExecStart= process belonging to unit hitch.service has exited.
░░
░░ The process' exit code is 'exited' and its exit status is 1.
Mar 17 12:46:36 web2 systemd[1]: hitch.service: Failed with result 'exit-code'.

Possible Solution

Just ignore the faulty pem file but keep on running with the correct ones.

Steps to Reproduce (for bugs)

put bogus pem file in directory where they are read from:

settings in conf file:

pem-dir = "/lego/certificates"
pem-dir-glob = "*.pem"

Context

Very nasty; all production websites down for a while.

Your Environment

Debain; everything fairly up to date.
hitch 1.6.1 (installed with: sudo apt install hitch )

If this was fixed after version 1.6.1, we sincerely apologise for this bug report, and, as such, hope Debian will have its packages more up-to-date

Thanks for making such a great piece of software,
Dennis Gaastra

@gquintard
Copy link
Contributor

ping @daghf, @dridi

@Keeline
Copy link

Keeline commented Sep 23, 2022

We have had this issue from time to time. A partially-created or missing pem file will cause hitch to crash upon restart. Usually this is followed by a scramble to identify the offending line from the service hitch status and comment it out of the hitch.conf and restart hitch.

We have other servers where SSL is terminated with nginx. An nginx -t is fairly robust to check the configuration files and will report on missing or flawed files before we attempt to restart nginx.

The equivalent hitch -t only seems to check that the hitch.conf is syntactically correct. This is only part of the issue. It certainly knows there is a problem when it attempts to restart. Why not some kind of dry run option to prevent problems?

I wrote a small script to at least check and see that the file mentioned in the pem lines exists.

James D. Keeline


#!/bin/bash
HITCH=/etc/hitch/hitch.conf
ERR=0

hitch -t || ERR=1

for PEM in $(grep ^pem $HITCH | awk -F'"' '{print $2}')
do
if [ ! -f "$PEM" ]; then
echo "$PEM missing"
ERR=2
fi
done

if [ $ERR -gt 0 ]; then
echo "Errors found [$ERR]. Do not restart hitch."
exit 1
else
echo "Scan of $HITCH done. It should be OK to restart hitch."
fi

@daghf daghf self-assigned this Sep 27, 2022
@dgaastra
Copy link
Author

Thanks for the script, but we really need the hitch developers to "Just ignore the faulty pem file but keep on running with the correct ones."

@daghf
Copy link
Member

daghf commented Jun 16, 2023

Apologies for taking my time in getting back to you here.

I'm sorry to say I'm struggling to reproduce this - even when trying 1.6.1. Adding bogus files to a pem-dir or adding a pem-file entry pointing at a missing file just yields Config reload failed with the service still running on the previous config.

Any way you could come up with a reproducer?

@dgaastra
Copy link
Author

Hi Dag, thanks for looking into this. We have

pem-dir = "/htdocs/admin/lego/certificates"
pem-dir-glob = "*.pem"

Our PEMs are typically in the following format:

-----BEGIN CERTIFICATE-----
C1...
-----END CERTIFICATE-----

-----BEGIN CERTIFICATE-----
C2...
-----END CERTIFICATE-----

-----BEGIN CERTIFICATE-----
C3...
-----END CERTIFICATE-----
-----BEGIN RSA PRIVATE KEY-----
P1
-----END RSA PRIVATE KEY-----
-----BEGIN DH PARAMETERS-----
D1
-----END DH PARAMETERS-----
-----BEGIN DH PARAMETERS-----
D2
-----END DH PARAMETERS-----

Try to leave one or more of the sections C1-C3 or P1 or D1-2 out and see what happens. I don't exactly remember the bogus PEM in great detail, however, next time, will take a note of it when it happens again. Maybe try with leaving P1 out.

Thanks so kindly,
Dennis

@iammeken
Copy link

iammeken commented Jun 16, 2023

Normally, I will run

hitch -t--config=/etc/hitch/hitch.conf

to check all certs before reload/restart

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants