This directory contains vulnerable regex detectors. These detectors identify regexes that are vulnerable to catastrophic backtracking.
The detect-vuln.pl
driver accepts:
- 'pattern' (regex pattern to test)
- ['detectors'] (array of names of detectors to query)
- ['timeLimit'] (in seconds, time granted to each detector)
- ['memoryLimit'] (in MB, memory granted to each detector)
and queries the requested detectors on the regex while applying per-detector time and memory limits. It prints a summary in JSON to STDOUT.
See usage message for details.
- After you run
./configure
, thesrc/detectors/
dir contains a built version of each detector. - The
src/drivers
dir contains a driver for each detector. This layer letsdetect-vuln.pl
query detectors with a uniform interface.
The regex engine in most languages is what's called a backtracking engine. It builds a non-deterministic finite automata (NFA) to check whether the input string matches the regex. To implement non-deterministic choices, a backtracking engine saves the current state, then makes one of the possible choices. It pushes the state onto a stack of possible choices.
If the choice results in a match, it returns.
If the choice results in a mismatch, it pops off of the stack of possible choices and tries a different choice. This is called backtracking. When it runs out of possible choices, it returns "mismatch".
Catastrophic backtracking is a condition in which the regex engine backtracks super-linearly in the size of the input.
The degree of backtracking can be polynomial (e.g. O(n^2)
) or in the worst case exponential (O(2^n)
).
We use detectors that propose evil input. This excludes heuristics like star height, as popularized by the safe-regex project.
These are the detectors we use:
- RXXR2 (Rathnayake and Thielecke). Project homepage, paper.
- REXPLOITER (Wustholz, Olivo, Heule, and Dillig). JAR, paper.
- RegexStaticAnalysis (Weideman, van der Merwe, Berglund, and Watson). Project homepage, paper, thesis.
- ReScue (Shen, Jiang, Xu, Yu, Ma, and Lu). Project homepage, paper.
Writing a detector is difficult. I have no advice on this.
Adding an existing detector is easy!
- Create a git submodule for the detector.
- Add any configuration steps necessary (if no git submodule is possible, configuration should download before building it).
- Write a driver that accepts as input a file name whose contents are a JSON object with keys:
regex
. - Emit (to STDOUT) the result in JSON. This is an "opinion" object with keys:
canAnalyze
(0 or 1)isSafe
(0 or 1)evilInputs
: array of "evil input" objects with keys:pumpPairs
array of "pumpPair" objects with keys:prefix
,pump
suffix
- If the output from the detector is unparseable, the evil input object should be the string "COULD-NOT-PARSE".