Add file-based locking to CRL modification #365

AdamWill · 2016-03-04T04:58:33Z

I'm running this fairly simple listener:

https://git.fedorahosted.org/cgit/fedora-qa.git/tree/check-compose/check-compose-fedmsg.in

to generate the 'compose check' emails. It's running on a Fedora 23 system, with:

python-fedmsg-core-0.16.2-1.fc23.noarch
m2crypto-0.22.5-2.fc23.x86_64
python-m2ext-0.1-7.fc23.x86_64

it seems like it runs fine for a while then, every so often, blows up at this point in x509.py function validate():

# Load and check against the CRL
crl = _load_remote_cert(
    config.get('crl_location', 'https://fedoraproject.org/fedmsg/crl.pem'),
    config.get('crl_cache', '/var/cache/fedmsg/crl.pem'),
    config.get('crl_cache_expiry', 1800),
    **config)
crl = M2Crypto.X509.load_crl(crl)

I can't immediately see why. When I look after seeing a crash, /var/run/fedmsg/crl.pem is there, and if I try, I can load it manually with import M2Crypto; M2Crypto.X509.load_crl('/var/run/fedmsg/crl.pem').

Here are the tracebacks I've got:

Traceback (most recent call last):
File "/usr/bin/check-compose-fedmsg", line 77, in <module>
main()
File "/usr/bin/check-compose-fedmsg", line 68, in main
for (name, endpoint, topic, msg) in fedmsg.tail_messages(mute=True, **config):
File "/usr/lib/python2.7/site-packages/fedmsg/__init__.py", line 109, in tail_messages
for item in __local.__context.tail_messages(**kw):
File "/usr/lib/python2.7/site-packages/fedmsg/core.py", line 325, in tail_messages
for msg in self._poll(poller, subs):
File "/usr/lib/python2.7/site-packages/fedmsg/core.py", line 397, in _poll
yield self._run_socket(s, name, ep, watched_names)
File "/usr/lib/python2.7/site-packages/fedmsg/core.py", line 409, in _run_socket
if not validate or fedmsg.crypto.validate(msg, **self.c):
File "/usr/lib/python2.7/site-packages/fedmsg/crypto/__init__.py", line 256, in validate
return backend.validate(message, **cfg)
File "/usr/lib/python2.7/site-packages/fedmsg/crypto/x509.py", line 149, in validate
crl = M2Crypto.X509.load_crl(crl)
File "/usr/lib64/python2.7/site-packages/M2Crypto/X509.py", line 1101, in load_crl
raise X509Error(Err.get_error())
M2Crypto.X509.X509Error: 140197791061760:error:0906D066:PEM routines:PEM_read_bio:bad end line:pem_lib.c:809:

that one's shown up twice. This one's only happened once:

Traceback (most recent call last):
File "/usr/bin/check-compose-fedmsg", line 77, in <module>
main()
File "/usr/bin/check-compose-fedmsg", line 68, in main
for (name, endpoint, topic, msg) in fedmsg.tail_messages(mute=True, **config):
File "/usr/lib/python2.7/site-packages/fedmsg/__init__.py", line 109, in tail_messages
for item in __local.__context.tail_messages(**kw):
File "/usr/lib/python2.7/site-packages/fedmsg/core.py", line 325, in tail_messages
for msg in self._poll(poller, subs):
File "/usr/lib/python2.7/site-packages/fedmsg/core.py", line 397, in _poll
yield self._run_socket(s, name, ep, watched_names)
File "/usr/lib/python2.7/site-packages/fedmsg/core.py", line 409, in _run_socket
if not validate or fedmsg.crypto.validate(msg, **self.c):
File "/usr/lib/python2.7/site-packages/fedmsg/crypto/__init__.py", line 256, in validate
return backend.validate(message, **cfg)
File "/usr/lib/python2.7/site-packages/fedmsg/crypto/x509.py", line 149, in validate
crl = M2Crypto.X509.load_crl(crl)
File "/usr/lib64/python2.7/site-packages/M2Crypto/X509.py", line 1101, in load_crl
raise X509Error(Err.get_error())
M2Crypto.X509.X509Error: 140544838350592:error:0906D06C:PEM routines:PEM_read_bio:no start line:pem_lib.c:701:Expecting: X509 CRL

they're similar, but the first is bad end line, the second is no start line.

The text was updated successfully, but these errors were encountered:

AdamWill · 2016-03-04T19:13:25Z

So I think @ralphbean nailed this on the first try:

<threebean> adamw: well - are they running on the same host by chance?
<threebean> they could be stampedeing all over the file.

I've written three naive fedmsg consumers for handling some QA stuff in the short term (ideally we should make them into taskotron tasks in the long term, I think), and we have a couple of hosts where two of them are running at the same time. It does seem like this only started happening after I started running two naive consumers at the same time. So I suspect Ralph may be right and this happens when both of them happen to try and do something to the same CRL cache file at the same time.

Adding locking for writing to that file might be a good idea, I guess. I'm gonna see if I can revise my consumers to use the fedmsg hub (as recommended in the docs) instead of tail_messages.

I'm switching the misc. QA fedmsg consumers over to using fedmsg- hub, due to fedora-infra/fedmsg#365 . So we need to adjust how we install check-compose, install a config file to enable the consumer, and also set up the fedmsg base and hub roles on the openqa server boxes (which do the check-compose job ATM).

This also changes how they are cached in order to better handle expired or rotated CRLs/CAs. If the CA/CRL is a file, it is read into a cache and used until a message fails validation. At that point, the cache is invalidated and the CA/CRL is reloaded. If the message still fails validation, we mark it as invalid and continue. If the CA/CRL is a URL, the file is downloaded and cached in memory just like the file approach. It would be nice if the process halted when a fatal error was encountered (like the CRL being expired), but unfortunately there's no way to communicate that to moksha. Once we drop moksha we can do that with a set of fedmsg exceptions, but for now logging at the error level is the only thing we can do. fixes fedora-infra#481 fixes fedora-infra#365 Signed-off-by: Jeremy Cline <[email protected]>

ralphbean changed the title ~~Periodic crashes in M2Crypto.X509.load_crl(crl)~~ Add file-based locking to CRL modification Mar 4, 2016

abompard added the bug label Nov 3, 2016

jeremycline mentioned this issue Sep 25, 2017

crl handling improvements #481

Closed

jeremycline mentioned this issue Sep 29, 2017

Allow the CA and CRL to be file paths #484

Merged

jeremycline closed this as completed in #484 Oct 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add file-based locking to CRL modification #365

Add file-based locking to CRL modification #365

AdamWill commented Mar 4, 2016

AdamWill commented Mar 4, 2016

Add file-based locking to CRL modification #365

Add file-based locking to CRL modification #365

Comments

AdamWill commented Mar 4, 2016

AdamWill commented Mar 4, 2016