The PCRE2 library is a set of C functions that implement regular expression pattern matching.
It is self-contained and portable, and designed to be easy to embed into existing projects and build systems, on almost any platform or build target.
The PCRE2 library is free and open-source (BSD licence), and permitted in proprietary software.
It supports Unicode matching and a very wide range of regular expression features. It accepts input in various character encodings, and optionally includes a highly performant JIT matching engine.
PCRE2 is mature and highly-trusted: bundled in dozens or hundreds of open-source and commercial products, such as Excel, Safari, Apache, and Git, and used as the basis for regular expressions in several programming languages including PHP and R.
Website | |
---|---|
Distribution | |
Testing | |
Platforms | Tested continuously on Linux, Windows, macOS, FreeBSD, Solaris; x86, ARM, RISC-V, POWER, S390X; many others known to work |
Show script
# Fetch PCRE2 with 'git clone', or use curl/wget to download a release.
# Here, let's use git to check out a release tag:
git clone https://github.com/PCRE2Project/pcre2.git ./pcre2 \
--branch pcre2-$PCRE2_VERSION \
-c advice.detachedHead=false --depth 1
# Now let's build PCRE2:
(cd ./pcre2; \
cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug -B build; \
cmake --build build/)
# Great, PCRE2 is built.
# Here's a quick little demo to show how we can make use of PCRE2.
# For a fuller example, see './pcre2/src/pcre2demo.c'.
# Try this pre-prepared sample code:
cat demo.c
----------------------------------------------------------------------
File: demo.c
----------------------------------------------------------------------
/* Set PCRE2_CODE_UNIT_WIDTH to indicate we will use 8-bit input. */
#define PCRE2_CODE_UNIT_WIDTH 8
#include <pcre2.h>
#include <string.h> /* for strlen */
#include <stdio.h> /* for printf */
int main(int argc, char* argv[]) {
if (argc != 3) {
fprintf(stderr, "Usage: %s <pattern> <subject>\n", argv[0]);
return 1;
}
const char *pattern = argv[1];
const char *subject = argv[2];
/* Compile the pattern. */
int error_number;
PCRE2_SIZE error_offset;
pcre2_code *re = pcre2_compile(
pattern, /* the pattern */
PCRE2_ZERO_TERMINATED, /* indicates pattern is zero-terminated */
0, /* default options */
&error_number, /* for error number */
&error_offset, /* for error offset */
NULL); /* use default compile context */
if (re == NULL) {
fprintf(stderr, "Invalid pattern: %s\n", argv[1]);
return 1;
}
/* Match the pattern against the subject text. */
pcre2_match_data *match_data =
pcre2_match_data_create_from_pattern(re, NULL);
int rc = pcre2_match(
re, /* the compiled pattern */
subject, /* the subject text */
strlen(subject), /* the length of the subject */
0, /* start at offset 0 in the subject */
0, /* default options */
match_data, /* block for storing the result */
NULL); /* use default match context */
/* Print the match result. */
if (rc == PCRE2_ERROR_NOMATCH) {
printf("No match\n");
} else if (rc < 0) {
fprintf(stderr, "Matching error\n");
} else {
PCRE2_SIZE *ovector = pcre2_get_ovector_pointer(match_data);
printf("Found match: '%.*s'\n", (int)(ovector[1] - ovector[0]),
subject + ovector[0]);
}
pcre2_match_data_free(match_data); /* Free resources */
pcre2_code_free(re);
return 0;
}
----------------------------------------------------------------------
# Compile the demo:
gcc -g -I./pcre2/build -L./pcre2/build demo.c -o demo -lpcre2-8
# Finally, run our demo:
./demo 'c.t' 'dogs and cats'
# We fetched, built, and called PCRE2 successfully! :)
The main ways of obtaining PCRE2 are:
-
Via Git clone:
git clone https://github.com/PCRE2Project/pcre2.git
Please use a release tag in production, not the development branch!
-
Via download of the release tarball.
-
Finally, PCRE2 is also bundled by various downstream package managers (such as Linux distributions, or vcpkg). These are provided by third parties, not the PCRE2 project.
The main ways of building PCRE2 are:
-
Via CMake (Linux/Windows/macOS, and others)
cd pcre2/ cmake -B build . cmake --build build/
-
Via Autoconf (Linux/Unix)
cd pcre2/ ./configure make
See "Platforms" below for links to more detailed build documentation.
The PCRE2 API supports strings in 8-bit, 16-bit, and 32-bit encodings, with or without UTF encoding. There is also EBCDIC support.
The default regular expression dialect closely matches the syntax and behaviour of Perl 5, with PCRE2-specific extensions. A wide variety of granular flags can be passed to the PCRE2 API to customise this to more closely follow other dialects such as JavaScript or Python.
The default matching engine uses a depth-first tree search with backtracking, which is highly feature-rich but has worst-case exponential time (PCRE2 allows aborting the match if a time limit is exceeded, expressed as a maximum number of steps in the tree search). The second matching engine uses a JIT for greatly improved performance, compiling the regular expression to a block of equivalent native machine code.
PCRE2 has a third matching engine, using a DFA engine which is generally slower, but has worst-case polynomial matching time and is able to find the POSIX-style "leftmost-longest" match.
There are accompanying utility functions for converting glob patterns and POSIX BRE/ERE patterns to PCRE2 regular expressions; and also for performing high-level regular expression operations such as search-and-replace with a powerful replacement string syntax.
As well as the PCRE2 API, the library also offers a POSIX-compatible <regex.h>
header and regexec()
function. However, this does not provide the ability to pass PCRE2 flags, so we recommend users consume the PCRE2 API if possible.
See the full library and API documentation for further details.
For third-party documentation, see further:
- A curated summary of changes for each PCRE release, and some excellent tutorials on PCRE2 on the RexEgg website.
- Jan Goyvaerts' popular Regular-Expressions.info site includes information about PCRE2 as well as tutorials and highly detailed comparisons of PCRE2 to other regular expression dialects.
- Jeffrey Friedl's book Mastering Regular Expressions includes chapters on Perl and PCRE, and is available in print and online via O'Reilly Media.
PCRE2 is portable C code, and is likely to work on any system with a C99 compiler.
- Operating systems
-
Our continuous integration tests on Linux (GCC and Clang, glibc and musl), Windows (MSVC and MinGW-x64), and macOS (Clang), as well as FreeBSD, and Solaris (Oracle Studio
cc
). - Processors
- PCRE2 is tested continuously on x86 (i686 and amd64), ARM 32- and 64-bit (armv7 and aarch64), RISC-V (riscv64), POWER (ppc64le), and the big-endian S390x.
Other systems are likely to work (including mobile, embedded platforms, and commercial UNIX systems), but these are not tested continuously by the PCRE2 maintainers. Users are encouraged to run the full PCRE2 test suite when compiling for any new platform. We are aware of working ports to VMS and z/OS (PCRE2 supports EBCDIC).
PCRE2 releases support CMake for building, and for UNIX platforms include a ./configure
script built by Autoconf. Build files for the Bazel build system and zig build
are also included. Integrating PCRE2 with other systems can be done by including the .c
files in an existing project.
Please see the files README and NON-AUTOTOOLS-BUILD for full build documentation, as well as the man pages, including man pcre2/doc/pcre2build.3
.
PCRE2 is released under the BSD 3-clause licence with a PCRE2 Exception. It is open-source and also corporate-friendly.
- See LICENCE for legal text.
- See AUTHORS for details of the current maintainers of PCRE2 and acknowledgements of its contributors, including Philip Hazel, the original author.
Join the community by reporting issues or asking questions via GitHub issues. We welcome feedback and proposals.
Contributions ranging from bug fixes to feature requests are welcome, and can be made via GitHub pull requests.
Please review our SECURITY policy for information on reporting security issues.
Release announcements will be made via the [email protected] mailing list, where you can also start discussions about PCRE2 issues and development. You can browse the list archives.