Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perl compatible matching of '\n' with '.' when (*NUL) #43

Closed
carenas opened this issue Nov 11, 2021 · 1 comment
Closed

perl compatible matching of '\n' with '.' when (*NUL) #43

carenas opened this issue Nov 11, 2021 · 1 comment

Comments

@carenas
Copy link
Contributor

carenas commented Nov 11, 2021

perl doesn't match '\n' with '.' unless the "s" modifier is provided and regardless of what the input separator is as shown by:

$ printf '\n\na\0' | perl -ne 'BEGIN { $/="\0" } /(?<=\n)(.*)$/ and print $1' | od -c | head -1
0000000    a  \0
$ printf '\n\na\0' | perl -ne 'BEGIN { $/="\0" } /(?<=\n)(.*)$/s and print $1' | od -c | head -1
0000000   \n   a  \0

GNU grep (that uses the old PCRE) shows a similar behaviour when using NUL as a line delimiter (-z is for NUL separated input, and also sets the line terminator of output, just like perl's "-0 -l") as shown by GNU grep

$ printf '\n\na\0' | ggrep -Pzo '(?<=\n).*$' | od -c | head -1
0000000    a  \0

but PCRE2 does not, if the newline is not LF (or a compatible ANY or ANYCRLF) and that is actually validated by the testsuite (set 2) and IMHO makes more sense, but that will prevent grep (that has been recently updated to pcre2 on its unreleased version) to use PCRE2_NEWLINE_NUL as this change of behaviour might be considered a regression.

$ printf '\n\na\0' | pcre2grep -o -NNUL '(?<=\n).*$' | od -c | head -1
0000000   \n   a  \n

the documentation explicly says that no changes on the matching are expected when the newline definition is changed, and when newline is '\n' an equivalent "s" mode is provided through PCRE2_DOTALL making the result the same than perl for the modes that have '\n' as a valid new line delimiter, but not if CR or NUL are used, so there is at least a possibility this might be a "bug"?

FWIW, confirmed at least it is not a regression, as 8.x, while not having NUL, behaves the same when using CR and which is consistent with the behaviour observed in 10.x

@carenas
Copy link
Contributor Author

carenas commented Dec 2, 2021

Not a bug, but a difference in the design decisions between PCRE and perl which makes the behaviour naturally incompatible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant