This directory contains programs to extract the statically declared regexes from a program written any of the supported languages.
The driver extract-regexes.pl
accepts a JSON file with:
- file (program name)
- [language]
If language is not specified, the driver attempts to discover the correct language.
The most straightforward way to write an extractor is:
- Load the source code of the program.
- Build an AST.
- Walk the AST looking for regex declaration nodes.
- Collect the patterns.
- Print.
If no AST generator is available, you can also extract regexes with a custom "parser" that targets the use of regexes.
It's easy!
- Identify a not-yet-supported programming language.
- Write a program that accepts as input a file name.
- Statically extract all regexes in this file. If a regex is dynamically defined then use the special value "DYNAMIC-PATTERN".
- Emit (to STDOUT) in JSON an object with:
- key
file
(name) - key
couldParse
(0 or 1) - if
couldParse
, a keyregexes
with value an array whose elements are regex instances: objects with keys:pattern
[flag
]
- key
- Add appropriate routing to
extract-regexps.pl
.