-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression: Cannot use cyrptography in embedded wsgi #2299
Comments
Wow, would not have expected it to come from that... /cc @glyph |
Me either, the commit itself doesn't look like it could cause something like this, I think that it may have been an earlier commit that introduced code paths that weren't called until the triggering commit. |
Out of curiosity... Calling |
Yes, those two lines also resolve the issue. |
OK, so here's a hypothesis: This is a problem because sub-interpreters have an entirely different Python namespace (different modules, etc) but the same C namespace (same GIL, same C-level global variables, shared libraries, etc). I'm not sure exactly why initializing the engine a second time causes a problem, but it's C-level runtime initialization stuff, so ¯_(ツ)_/¯. I'm sure that cryptography being surprised to discover an already-initialized OpenSSL causes all kinds of exciting issues. @reaperhulk - you should like this one since it would also explain some of the other concurrency issues you're seeing at import time. (The main hypothesis I guess I'm describing here is that subinterpreters are just a completely broken concept nobody should use, but maybe cryptography could be made to work around these issues.) |
If we work on that assumption we can make engine addition idempotent at the C layer (see #2293) but since the random engine has callbacks in Python (that potentially point to functions not in the "current" sub-interpreter) it would still fail in the sub-interpreter...correct? |
There's a pretty good explanation here - http://emptysqua.re/blog/python-c-extensions-and-mod-wsgi/#but-c-extensions-are-not-isolated - of how C extensions are treated by |
I am pretty sure that you can call between different sub-interpreters just fine, since they share a GIL. |
I flied a cffi bug to ask for a way to keep the GIL while calling C: https://bitbucket.org/cffi/cffi/issues/220/some-way-to-annotate-a-call-to-ask-it-not |
One perhaps-obvious solution here would be to move all this library initialization to the module scope so that it's protected by the import lock, rather than trying to protect it with an ad-hoc lock. It seems counter to the way that cryptography is constructed, to avoid shared mutable state, but in this case there is just the illusion of such avoidance: the mutable state does in fact exist and attempting to abstract it away is hard in a multi-interpreter environment. |
Sub interpreters could definitely trigger a race in the engine setup code. #2293 proposes making this process idempotent which seems like a fairly reasonable solution to me. |
@glyph I'm not opposed to that solution if it resolves our problem, but does it given the change in behavior in python 3.4? |
Possibly not; I can't figure out at a glance how the per-module locks are initialized. @cyli suggests that cryptography could just take out a filesystem lock in |
Okay, reviving this conversation. Proposal for the sake of discussion: To avoid issues with sub-interpreters we should do the following...
Do people agree that this would resolve the issue? If not, why not? And if so, is there a way to do this without doing it all in C because obviously we'd prefer to not do that... |
FWIW, I'm strictly opposed to moving this code back itno C. On Tue, Sep 8, 2015 at 11:05 AM, Paul Kehrer [email protected]
"I disapprove of what you say, but I will defend to the death your right to |
Just because having more code in C is bad, or...? |
Pretty much :-) C is bad, and going back to our C code is a regression in functionality and On Tue, Sep 8, 2015 at 3:41 PM, Glyph [email protected] wrote:
"I disapprove of what you say, but I will defend to the death your right to |
Alex what regression in security do you see ? |
In order to properly initialize the random engine and locking callbacks, Cryptography needs to hold a lock so that these initializations are mutually exclusive between interpreters. I'm going to ignore the potential for concurrency among backends ( Here are some of the strategies I've discussed on IRC or in discussions with Cryptography maintainers: Use the GILCFFI presently offers no capability of holding the GIL; we would really like at least a temporary solution before that is likely to land. Not to mention that every new-CFFI-version requirement means a new pypy requirement, practically speaking, which is pretty aggressive. Plus, this only works on the C side of things. In Python, every bytecode executed might release the GIL, so it's no help there. Allocate our own locks in CryptographyThis is what Cryptography is trying to do now. The problem with acquiring such a lock is that there is nowhere to share such a lock. And a lock you can't share isn't exactly a lock. Any lock allocated in Python code is going to be allocated in multiple interpreters. If there were an API in C for allocating locks, then we could use that on the C side and then acquire that from the Python side. But there is no such API in C; POSIX locks and Windows locks are fairly different. We could of course use Python's API for this, but starting to rely on cpyext at this point in cryptography's history would seem like even more of a backslide than going back to C code. Re-use the global import lockAs I mentioned above, Python has a global import lock. What I didn't mention above (although I did on another ticket) is that there is an API, The API is deprecated externally, but the very definitely not deprecated ConclusionI think that the best way forward would be to use the import lock in the near term, and in the long term work with CFFI to have logic specifically dedicated to dealing with sub-interpreter concurrency by exposing some form of portable lock initialization, and then using that. Or maybe pursuade CPython that exposing the import lock is necessary for the long term. Or possibly even allowing Python code to run in |
Python provides APIs for portable locking; C doesn't. So it seems like a reasonable place to handle it, if we can manage it. |
Using the import lock like that sounds like a reasonable safeguard to me, at least while it remains available in implementations we support. It's not entirely obvious to me that the current approach to idempotency is in fact invalid but I guess that doesn't really matter if we have a real mutex we can use. On a slightly different note, this discussion made me look at
|
My reading of that code is that one subinterpreter will "win" and initialize the locks with its lock object, and then all other subinterpreters will get a reference to that via the pointers in openssl itself. So I think it's still correct, inasmuch as any code can be "correct" in the face of this mess… |
@glyph I think you are right about how the callbacks works means that whichever one wins the race will be the one that the callbacks are executed in. Unfortunately I think there's still a problem since sub interpreters have independent lifetimes. If the one that won the race exits the other interpreter will no longer have a valid locking callback to use. This same problem applies to osrandom too. It's not really very obvious to me that merely getting initial allocation of the callbacks safe solves the sub interpreter problem. |
@public I don't think that tearing down a subinterpreter will automatically destroy all objects within that subinterpreter. It might do something like clearing module dictionaries, but it relies upon GC to free objects. I suppose someone could write a proof of concept C program 😩 |
As @glyph notes, we actually still need the global import lock to protect the management of the per-module locks, so using that as at least a temporary workaround is reasonable. For Python 3.5+, cffi would ideally support using multi-phase initialization which has been designed to improve extension module subinterpreter support: https://docs.python.org/3.5/c-api/module.html#multi-phase-initialization That unfortunately won't be immediately helpful in cryptography's case, since the problem you have is one that is still an open question for multi-phase init: making it easy to handle external singleton libraries like OpenSSL in a subinterpreter compatible way (or at least in a way that fails noisily rather than introducing subtle data sharing problems - Cython opts for the latter approach by explicitly not supporting subinterpreters at all). Further work on this is planned for 3.6, so if anyone from cryptography and/or cffi has time to participate, we'd love to hear from you on import-sig: https://mail.python.org/mailman/listinfo/import-sig |
The "global" import lock is actually a C global: https://hg.python.org/cpython/file/tip/Python/import.c#l157 ...but that may change. While the motivation for the lock is thread races in the same interpreter, C extensions benefit (perhaps incidentally) in the case of subinterpreters. I'm glad this discussion came up because I expect at some point de-globalizing the lock between interpreters may happen as we try to further isolate interpreters in the same process. It's clear that the C extension case will have to be satisfied safely. That may not be the 3.6 timeframe, but I could see it happening by 3.7 (as we make headway on isolation efforts). Also... |
...the C extension support in subinterpreters to which Nick refers should help. Currently C extension modules can define C-level state (instead of using C globals); see https://www.python.org/dev/peps/pep-3121/. Perhaps it would help to support some form of process-global per-module state. |
@reaperhulk, I'll check on Monday when I get to work. |
@reaperhulk, it appears #2336 does not fix my issue. I'm also having trouble reducing the project to a minimal test case: I can reproduce the bug on the full project, but not on a single file which imports cryptography. |
@Naddiseo d'oh :( Is the project something where you could build a docker container we could use to test potential solutions or is it something proprietary? Subinterpreters are very frustrating... |
It is proprietary, so anything other than a minimal test case is a no-go. I'll continue to try and make a test case in my spare cycles. In the mean time, if you do have other changes you'd like to test, you can point me to a branch/commit and I'll give it a test. |
@reaperhulk I have a similiar problem (running on py27, cryptography 1.0 in a apache2 wsgi env and getting |
@toabctl that's great -- any chance you can provide a docker container that replicates the bug? We haven't been willing to land #2293 thus far because it relies on locking callbacks that are registered in Python. I think #2293 is really just narrowing the window for the race condition and not fixing the underlying issue, but I'd like to experiment with an environment that triggers the initial bug. |
No. Sorry for that. I can paste the wsgi file and the apache config if that helps. |
@reaperhulk, no joy with #2293 |
I noticed a similar issue with to combination of apache, mod_wsgi and the Requests module. When doing a request to a HTTPS site with Requests, the function call never returns. I eventually figured it out because I saw the following error in my apache error log: For the code that was causing the issue, see: snok/django-auth-adfs@28936fb --- UPDATE 2016/11/29 --- |
…ause of a bug in the cryptography module. ref pyca/cryptography/#2299
I'm going to close this as our current solution for Hopefully at some point in the future this won't be required, but for now it's the best we can do. |
Should the FAQ also mention #2537 since that is the (eventual) path forward on this issue? Or am I confused? |
Are there any downsides in using |
@glyph until we merge it I'm going to be somewhat quiet about it (other than to perhaps point people that direction). @jobec The random initialization issue appears to be mostly handled by some code we put in the 1.2 release, so it shouldn't be necessary to separately call |
Adding a note here as I ran into this problem and the documented workaround - setting WSGIProgressGroup %{global} - didn't work for me. I hope someone else will find this useful. From what I can glean, setting %{GLOBAL} doesn't assign the reference (e.g. WSGISCriptAlias) to the first sub-interpreter, but simply to the first process group. Sub-interpreters are created purely by the number of different wsgi applications that are in that process group, with "different application" is roughly equivalent to a different wsgi.py file. So, assuming that diagnosis true, the documented solution doesn't work at all and is in fact completely wrong. I added some code to the wsgi.py file to syslog out id(os) and os.getpid(), as a way of detecting additional sub-interpreters being generated. Using id(builtins) didn't seem to be useful. The fix was to ensure each application runs in a different process group, I.e., a different WSGIDaemonProcess for every different wsgi.py file you have. When I did this, this problem, along with the 'isinstance' problem the WSGI documentation referred to, completely went away. (As an aside, I only have a single wsgi.py file, but I'm referring to them from two different VirtualServers. I think that whatever code detects references to the same wsgi.py file breaks across the virtualserver boundary. I have multiple references to the wsgi.py file inside one of the VirtualServers, and that doesn't appear to cause additional sub-interpreters.) I hope this helps! |
I ran into this bug again (v1.4 of cryptography). First with an error like: |
This issue will be resolved with the next cryptography release. See #3229 |
We have discovered at $work that the 1.0 cryptography release does not work with apache2+mod_wsgi. The error log spits out "Truncated or oversized response headers received from daemon process", which seems to happen if the wsgi process dies. We've tracked the issue down to an upgrade with cryptography, and something to do with threading and/or the Python GIL (a work around is in the wsgi config to set WSGIApplicationGroup to
%{GLOBAL}
).I did a git bisect from 0.9.3 and 1.0, and found that the issue was introduced in commit b3d37a5.
The text was updated successfully, but these errors were encountered: