Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add file-based locking to CRL modification #365

Closed
AdamWill opened this issue Mar 4, 2016 · 1 comment · Fixed by #484
Closed

Add file-based locking to CRL modification #365

AdamWill opened this issue Mar 4, 2016 · 1 comment · Fixed by #484
Labels

Comments

@AdamWill
Copy link
Contributor

AdamWill commented Mar 4, 2016

I'm running this fairly simple listener:

https://git.fedorahosted.org/cgit/fedora-qa.git/tree/check-compose/check-compose-fedmsg.in

to generate the 'compose check' emails. It's running on a Fedora 23 system, with:

python-fedmsg-core-0.16.2-1.fc23.noarch
m2crypto-0.22.5-2.fc23.x86_64
python-m2ext-0.1-7.fc23.x86_64

it seems like it runs fine for a while then, every so often, blows up at this point in x509.py function validate():

# Load and check against the CRL
crl = _load_remote_cert(
    config.get('crl_location', 'https://fedoraproject.org/fedmsg/crl.pem'),
    config.get('crl_cache', '/var/cache/fedmsg/crl.pem'),
    config.get('crl_cache_expiry', 1800),
    **config)
crl = M2Crypto.X509.load_crl(crl)

I can't immediately see why. When I look after seeing a crash, /var/run/fedmsg/crl.pem is there, and if I try, I can load it manually with import M2Crypto; M2Crypto.X509.load_crl('/var/run/fedmsg/crl.pem').

Here are the tracebacks I've got:

Traceback (most recent call last):
File "/usr/bin/check-compose-fedmsg", line 77, in <module>
main()
File "/usr/bin/check-compose-fedmsg", line 68, in main
for (name, endpoint, topic, msg) in fedmsg.tail_messages(mute=True, **config):
File "/usr/lib/python2.7/site-packages/fedmsg/__init__.py", line 109, in tail_messages
for item in __local.__context.tail_messages(**kw):
File "/usr/lib/python2.7/site-packages/fedmsg/core.py", line 325, in tail_messages
for msg in self._poll(poller, subs):
File "/usr/lib/python2.7/site-packages/fedmsg/core.py", line 397, in _poll
yield self._run_socket(s, name, ep, watched_names)
File "/usr/lib/python2.7/site-packages/fedmsg/core.py", line 409, in _run_socket
if not validate or fedmsg.crypto.validate(msg, **self.c):
File "/usr/lib/python2.7/site-packages/fedmsg/crypto/__init__.py", line 256, in validate
return backend.validate(message, **cfg)
File "/usr/lib/python2.7/site-packages/fedmsg/crypto/x509.py", line 149, in validate
crl = M2Crypto.X509.load_crl(crl)
File "/usr/lib64/python2.7/site-packages/M2Crypto/X509.py", line 1101, in load_crl
raise X509Error(Err.get_error())
M2Crypto.X509.X509Error: 140197791061760:error:0906D066:PEM routines:PEM_read_bio:bad end line:pem_lib.c:809:

that one's shown up twice. This one's only happened once:

Traceback (most recent call last):
File "/usr/bin/check-compose-fedmsg", line 77, in <module>
main()
File "/usr/bin/check-compose-fedmsg", line 68, in main
for (name, endpoint, topic, msg) in fedmsg.tail_messages(mute=True, **config):
File "/usr/lib/python2.7/site-packages/fedmsg/__init__.py", line 109, in tail_messages
for item in __local.__context.tail_messages(**kw):
File "/usr/lib/python2.7/site-packages/fedmsg/core.py", line 325, in tail_messages
for msg in self._poll(poller, subs):
File "/usr/lib/python2.7/site-packages/fedmsg/core.py", line 397, in _poll
yield self._run_socket(s, name, ep, watched_names)
File "/usr/lib/python2.7/site-packages/fedmsg/core.py", line 409, in _run_socket
if not validate or fedmsg.crypto.validate(msg, **self.c):
File "/usr/lib/python2.7/site-packages/fedmsg/crypto/__init__.py", line 256, in validate
return backend.validate(message, **cfg)
File "/usr/lib/python2.7/site-packages/fedmsg/crypto/x509.py", line 149, in validate
crl = M2Crypto.X509.load_crl(crl)
File "/usr/lib64/python2.7/site-packages/M2Crypto/X509.py", line 1101, in load_crl
raise X509Error(Err.get_error())
M2Crypto.X509.X509Error: 140544838350592:error:0906D06C:PEM routines:PEM_read_bio:no start line:pem_lib.c:701:Expecting: X509 CRL

they're similar, but the first is bad end line, the second is no start line.

@AdamWill
Copy link
Contributor Author

AdamWill commented Mar 4, 2016

So I think @ralphbean nailed this on the first try:

<threebean> adamw: well - are they running on the same host by chance?
<threebean> they could be stampedeing all over the file.

I've written three naive fedmsg consumers for handling some QA stuff in the short term (ideally we should make them into taskotron tasks in the long term, I think), and we have a couple of hosts where two of them are running at the same time. It does seem like this only started happening after I started running two naive consumers at the same time. So I suspect Ralph may be right and this happens when both of them happen to try and do something to the same CRL cache file at the same time.

Adding locking for writing to that file might be a good idea, I guess. I'm gonna see if I can revise my consumers to use the fedmsg hub (as recommended in the docs) instead of tail_messages.

@ralphbean ralphbean changed the title Periodic crashes in M2Crypto.X509.load_crl(crl) Add file-based locking to CRL modification Mar 4, 2016
henrysher pushed a commit to henrysher/fedora-infra-ansible that referenced this issue Mar 4, 2016
I'm switching the misc. QA fedmsg consumers over to using fedmsg-
hub, due to fedora-infra/fedmsg#365 .
So we need to adjust how we install check-compose, install a
config file to enable the consumer, and also set up the fedmsg
base and hub roles on the openqa server boxes (which do the
check-compose job ATM).
@abompard abompard added the bug label Nov 3, 2016
jeremycline added a commit to jeremycline/fedmsg that referenced this issue Sep 29, 2017
This also changes how they are cached in order to better handle expired
or rotated CRLs/CAs.

If the CA/CRL is a file, it is read into a cache and used until a
message fails validation. At that point, the cache is invalidated and
the CA/CRL is reloaded. If the message still fails validation, we mark
it as invalid and continue.

If the CA/CRL is a URL, the file is downloaded and cached in memory just
like the file approach.

It would be nice if the process halted when a fatal error was
encountered (like the CRL being expired), but unfortunately there's no
way to communicate that to moksha. Once we drop moksha we can do that
with a set of fedmsg exceptions, but for now logging at the error level
is the only thing we can do.

fixes fedora-infra#481
fixes fedora-infra#365

Signed-off-by: Jeremy Cline <[email protected]>
jeremycline added a commit to jeremycline/fedmsg that referenced this issue Sep 29, 2017
This also changes how they are cached in order to better handle expired
or rotated CRLs/CAs.

If the CA/CRL is a file, it is read into a cache and used until a
message fails validation. At that point, the cache is invalidated and
the CA/CRL is reloaded. If the message still fails validation, we mark
it as invalid and continue.

If the CA/CRL is a URL, the file is downloaded and cached in memory just
like the file approach.

It would be nice if the process halted when a fatal error was
encountered (like the CRL being expired), but unfortunately there's no
way to communicate that to moksha. Once we drop moksha we can do that
with a set of fedmsg exceptions, but for now logging at the error level
is the only thing we can do.

fixes fedora-infra#481
fixes fedora-infra#365

Signed-off-by: Jeremy Cline <[email protected]>
jeremycline added a commit to jeremycline/fedmsg that referenced this issue Oct 2, 2017
This also changes how they are cached in order to better handle expired
or rotated CRLs/CAs.

If the CA/CRL is a file, it is read into a cache and used until a
message fails validation. At that point, the cache is invalidated and
the CA/CRL is reloaded. If the message still fails validation, we mark
it as invalid and continue.

If the CA/CRL is a URL, the file is downloaded and cached in memory just
like the file approach.

It would be nice if the process halted when a fatal error was
encountered (like the CRL being expired), but unfortunately there's no
way to communicate that to moksha. Once we drop moksha we can do that
with a set of fedmsg exceptions, but for now logging at the error level
is the only thing we can do.

fixes fedora-infra#481
fixes fedora-infra#365

Signed-off-by: Jeremy Cline <[email protected]>
jeremycline added a commit to jeremycline/fedmsg that referenced this issue Oct 2, 2017
This also changes how they are cached in order to better handle expired
or rotated CRLs/CAs.

If the CA/CRL is a file, it is read into a cache and used until a
message fails validation. At that point, the cache is invalidated and
the CA/CRL is reloaded. If the message still fails validation, we mark
it as invalid and continue.

If the CA/CRL is a URL, the file is downloaded and cached in memory just
like the file approach.

It would be nice if the process halted when a fatal error was
encountered (like the CRL being expired), but unfortunately there's no
way to communicate that to moksha. Once we drop moksha we can do that
with a set of fedmsg exceptions, but for now logging at the error level
is the only thing we can do.

fixes fedora-infra#481
fixes fedora-infra#365

Signed-off-by: Jeremy Cline <[email protected]>
jeremycline added a commit to jeremycline/fedmsg that referenced this issue Oct 2, 2017
This also changes how they are cached in order to better handle expired
or rotated CRLs/CAs.

If the CA/CRL is a file, it is read into a cache and used until a
message fails validation. At that point, the cache is invalidated and
the CA/CRL is reloaded. If the message still fails validation, we mark
it as invalid and continue.

If the CA/CRL is a URL, the file is downloaded and cached in memory just
like the file approach.

It would be nice if the process halted when a fatal error was
encountered (like the CRL being expired), but unfortunately there's no
way to communicate that to moksha. Once we drop moksha we can do that
with a set of fedmsg exceptions, but for now logging at the error level
is the only thing we can do.

fixes fedora-infra#481
fixes fedora-infra#365

Signed-off-by: Jeremy Cline <[email protected]>
jeremycline added a commit to jeremycline/fedmsg that referenced this issue Oct 2, 2017
This also changes how they are cached in order to better handle expired
or rotated CRLs/CAs.

If the CA/CRL is a file, it is read into a cache and used until a
message fails validation. At that point, the cache is invalidated and
the CA/CRL is reloaded. If the message still fails validation, we mark
it as invalid and continue.

If the CA/CRL is a URL, the file is downloaded and cached in memory just
like the file approach.

It would be nice if the process halted when a fatal error was
encountered (like the CRL being expired), but unfortunately there's no
way to communicate that to moksha. Once we drop moksha we can do that
with a set of fedmsg exceptions, but for now logging at the error level
is the only thing we can do.

fixes fedora-infra#481
fixes fedora-infra#365

Signed-off-by: Jeremy Cline <[email protected]>
jeremycline added a commit to jeremycline/fedmsg that referenced this issue Oct 2, 2017
This also changes how they are cached in order to better handle expired
or rotated CRLs/CAs.

If the CA/CRL is a file, it is read into a cache and used until a
message fails validation. At that point, the cache is invalidated and
the CA/CRL is reloaded. If the message still fails validation, we mark
it as invalid and continue.

If the CA/CRL is a URL, the file is downloaded and cached in memory just
like the file approach.

It would be nice if the process halted when a fatal error was
encountered (like the CRL being expired), but unfortunately there's no
way to communicate that to moksha. Once we drop moksha we can do that
with a set of fedmsg exceptions, but for now logging at the error level
is the only thing we can do.

fixes fedora-infra#481
fixes fedora-infra#365

Signed-off-by: Jeremy Cline <[email protected]>
jeremycline added a commit to jeremycline/fedmsg that referenced this issue Oct 2, 2017
This also changes how they are cached in order to better handle expired
or rotated CRLs/CAs.

If the CA/CRL is a file, it is read into a cache and used until a
message fails validation. At that point, the cache is invalidated and
the CA/CRL is reloaded. If the message still fails validation, we mark
it as invalid and continue.

If the CA/CRL is a URL, the file is downloaded and cached in memory just
like the file approach.

It would be nice if the process halted when a fatal error was
encountered (like the CRL being expired), but unfortunately there's no
way to communicate that to moksha. Once we drop moksha we can do that
with a set of fedmsg exceptions, but for now logging at the error level
is the only thing we can do.

fixes fedora-infra#481
fixes fedora-infra#365

Signed-off-by: Jeremy Cline <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants