-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stack overflow fix #236
Stack overflow fix #236
Conversation
Self contained test case:
|
All of the alternatives are unpleasant - on the one hand your fix is perfectly reasonable - I can think of no really good reason for a very large number of alternatives in a look-behind. My only worry, is that - like you I'm sure - I dislike hard coded limits, what happens for example if we're on embedded hardware with a very small stack? Removing the second recursive call is not too hard (it's almost tail recursion after all), but still leaves us open to infinite recursion on the first call via nested alternatives such as:
Let me have a think.... there will be much much harder cases than this BTW!! |
One of my first-pass thoughts was to simply turn this from recursion into iteration, but mapping the control flow proved difficult. I think we need a sane limit regardless, because even if we did avoid stack overflows, we still want to prevent an every-growing usage of memory. Maybe I should just give it another shot to make the solution iterative here. |
Like this: #237 ? It removes the recursion, and interestingly the stack never actually grows large unless you start having nested alternatives. |
Ha, yeah, exactly like that. I think we need both a reasonable implementation limit and we also need a way of avoiding an ever-growing stack. Let's see if we can't spruce up the existing PR. |
@jzmaddock I took a closer look at the PR you had created, and it looked perfect to me. Seems to solve the issue too, the code seems to pass with the corpus entry plucked from oss-fuzz. If you wanna merge, you should feel free to do so. I tried to do something similar to yours, but my approach was way less clean so I'm very impressed by your technique. Btw, when do you think we'll start merging these fixes to the master branch? |
OK merged. Thanks. I'm quite bad about merging to master in good time... might be an idea to merge sooner rather than later if only to check that the Fuzzer issues get auto-closed. |
Closing in favour of #237 |
Blocked on: #235
Fuzzing found an interesting pathological case in the regex creator. In the backstep calculation, we can introduce unbounded recursion which causes a stack overflow (in this case, only on clang).
I tried coming up with better iterative solutions but they weren't particularly satisfactory and I'm not overwhelmingly satisfied with my approach here either so I'm looking for a lot of feedback.
The shape of the regex here is something like
||||||...
repeated ad nauseum. It was hard to reliably reproduce without just including the actual fuzzing data so I added it to the corpus.I also took the liberty of modernizing the fuzzing jamfiles because they seemed outdated to me. libFuzzer was superceded by Centipede which became part of the official Google repos but it has a different API, and nowadays you can just use:
clang -fsanitize=fuzzer
anyway which does exactly what we want.