Stack overflow fix #236

cmazakas · 2024-12-12T19:09:32Z

Blocked on: #235

Fuzzing found an interesting pathological case in the regex creator. In the backstep calculation, we can introduce unbounded recursion which causes a stack overflow (in this case, only on clang).

I tried coming up with better iterative solutions but they weren't particularly satisfactory and I'm not overwhelmingly satisfied with my approach here either so I'm looking for a lot of feedback.

The shape of the regex here is something like ||||||... repeated ad nauseum. It was hard to reliably reproduce without just including the actual fuzzing data so I added it to the corpus.

I also took the liberty of modernizing the fuzzing jamfiles because they seemed outdated to me. libFuzzer was superceded by Centipede which became part of the official Google repos but it has a different API, and nowadays you can just use: clang -fsanitize=fuzzer anyway which does exactly what we want.

jzmaddock · 2024-12-15T11:54:40Z

Self contained test case:

#include <boost/regex.hpp>

int main()
{
   char const s[] = "(?<=(||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||u||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||))";
   boost::regex rx(s);
}

jzmaddock · 2024-12-15T12:19:54Z

All of the alternatives are unpleasant - on the one hand your fix is perfectly reasonable - I can think of no really good reason for a very large number of alternatives in a look-behind. My only worry, is that - like you I'm sure - I dislike hard coded limits, what happens for example if we're on embedded hardware with a very small stack? Removing the second recursive call is not too hard (it's almost tail recursion after all), but still leaves us open to infinite recursion on the first call via nested alternatives such as:

(?<=((|(|(|(|(|(|))))))

Let me have a think.... there will be much much harder cases than this BTW!!

cmazakas · 2024-12-16T16:09:17Z

One of my first-pass thoughts was to simply turn this from recursion into iteration, but mapping the control flow proved difficult.

I think we need a sane limit regardless, because even if we did avoid stack overflows, we still want to prevent an every-growing usage of memory.

Maybe I should just give it another shot to make the solution iterative here.

Refs #236.

jzmaddock · 2024-12-16T17:16:12Z

Like this: #237 ? It removes the recursion, and interestingly the stack never actually grows large unless you start having nested alternatives.

cmazakas · 2024-12-17T15:28:52Z

Like this: #237 ? It removes the recursion, and interestingly the stack never actually grows large unless you start having nested alternatives.

Ha, yeah, exactly like that.

I think we need both a reasonable implementation limit and we also need a way of avoiding an ever-growing stack. Let's see if we can't spruce up the existing PR.

cmazakas · 2024-12-17T23:24:15Z

@jzmaddock I took a closer look at the PR you had created, and it looked perfect to me. Seems to solve the issue too, the code seems to pass with the corpus entry plucked from oss-fuzz.

If you wanna merge, you should feel free to do so. I tried to do something similar to yours, but my approach was way less clean so I'm very impressed by your technique.

Btw, when do you think we'll start merging these fixes to the master branch?

jzmaddock · 2024-12-18T17:08:37Z

OK merged. Thanks.

I'm quite bad about merging to master in good time... might be an idea to merge sooner rather than later if only to check that the Fuzzer issues get auto-closed.

jzmaddock · 2024-12-18T17:09:14Z

Closing in favour of #237

cmazakas added 5 commits December 9, 2024 10:16

add clang 16-18 with sanitization

b88715e

fix negation operator creating an enum with an invalid value

f4129c0

use manual pointer arithmetic to avoid invalid pointer cast

26b0853

fix stack overflow found by fuzzing

1cc781b

update fuzzing jamfile for modern clang and fuzzers

c3f1251

jzmaddock added a commit that referenced this pull request Dec 16, 2024

Make backstep calculation non-recursive.

57ca082

Refs #236.

jzmaddock mentioned this pull request Dec 16, 2024

Make backstep calculation non-recursive. #237

Merged

jzmaddock closed this Dec 18, 2024

cmazakas deleted the feature/fix-stack-overflow branch December 18, 2024 18:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stack overflow fix #236

Stack overflow fix #236

cmazakas commented Dec 12, 2024

jzmaddock commented Dec 15, 2024

jzmaddock commented Dec 15, 2024

cmazakas commented Dec 16, 2024

jzmaddock commented Dec 16, 2024

cmazakas commented Dec 17, 2024

cmazakas commented Dec 17, 2024

jzmaddock commented Dec 18, 2024

jzmaddock commented Dec 18, 2024

Stack overflow fix #236

Stack overflow fix #236

Conversation

cmazakas commented Dec 12, 2024

jzmaddock commented Dec 15, 2024

jzmaddock commented Dec 15, 2024

cmazakas commented Dec 16, 2024

jzmaddock commented Dec 16, 2024

cmazakas commented Dec 17, 2024

cmazakas commented Dec 17, 2024

jzmaddock commented Dec 18, 2024

jzmaddock commented Dec 18, 2024