Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: regrepeat() called with unrecognized node type 3 #17677

Closed
dur-randir opened this issue Mar 27, 2020 · 10 comments · Fixed by #17753
Closed

panic: regrepeat() called with unrecognized node type 3 #17677

dur-randir opened this issue Mar 27, 2020 · 10 comments · Fixed by #17753
Milestone

Comments

@dur-randir
Copy link
Member

This is a bug report for perl from [email protected],
generated with the help of perlbug 1.41 running under perl 5.31.10.

[Please describe your issue here]

While fuzzing perl v5.31.9-70-g0c96aa4b7b built with afl and run
under libdislocator, I found the following program

0=~/\p{nv="\A?"}/

to emit 'panic: regrepeat() called with unrecognized node type 3='MBOL' at -e line 1.'. Similar panic() can also be triggered for node type 5='MEOL'. GDB stack trace at the point of croak() is following:

#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007ffff7c24535 in __GI_abort () at abort.c:79
#2 0x0000555555726811 in Perl_vcroak (pat=0x555555baeaa8 "panic: regrepeat() called with unrecognized node type %d='%s'", args=0x7fffffffaf10)
at util.c:1745
#3 0x0000555555726b3d in Perl_croak (pat=0x555555baeaa8 "panic: regrepeat() called with unrecognized node type %d='%s'") at util.c:1803
#4 0x00005555558de989 in S_regrepeat (prog=0x555555c42970, startposp=0x7fffffffb1f8, p=0x555555c45ad0, loceol=0x555555b03dcc "", reginfo=0x7fffffffbb90,
max=1, depth=0) at regexec.c:10075
#5 0x00005555558d74c1 in S_regmatch (reginfo=0x7fffffffbb90, startpos=0x555555b03dc8 "-1/2", prog=0x555555c45ac8) at regexec.c:8689
#6 0x00005555558c4491 in S_regtry (reginfo=0x7fffffffbb90, startposp=0x7fffffffb958) at regexec.c:4029
#7 0x00005555558c3e2a in Perl_regexec_flags (rx=0x555555c3af88, stringarg=0x555555b03dc8 "-1/2", strend=0x555555b03dcc "", strbeg=0x555555b03dc8 "-1/2",
minend=0, sv=0x555555c3afa0, data=0x0, flags=0) at regexec.c:3892
#8 0x0000555555700e16 in S_execute_wildcard (prog=0x555555c3af88, stringarg=0x555555b03dc8 "-1/2", strend=0x555555b03dcc "", strbeg=0x555555b03dc8 "-1/2",
minend=0, screamer=0x555555c3afa0, nosave=0) at regcomp.c:23164
#9 0x0000555555702e01 in S_parse_uniprop_string (name=0x555555c42833 "nv="\A?"}", name_len=8, is_utf8=false, to_fold=false, runtime=false,
deferrable=true, strings=0x7fffffffc2a8, user_defined_ptr=0x7fffffffc2a0, msg=0x555555c3af40, level=0) at regcomp.c:23855
#10 0x00005555556e71f4 in S_regclass (pRExC_state=0x7fffffffd680, flagp=0x7fffffffcd34, depth=5, stop_at_1=true, allow_mutiple_chars=false,
silence_non_portable=false, strict=false, optimizable=true, ret_invlist=0x0) at regcomp.c:17675
#11 0x00005555556cc436 in S_regatom (pRExC_state=0x7fffffffd680, flagp=0x7fffffffcd34, depth=4) at regcomp.c:13731
#12 0x00005555556c2b32 in S_regpiece (pRExC_state=0x7fffffffd680, flagp=0x7fffffffce60, depth=3) at regcomp.c:12562
#13 0x00005555556c245f in S_regbranch (pRExC_state=0x7fffffffd680, flagp=0x7fffffffcf00, first=1, depth=2) at regcomp.c:12482
#14 0x00005555556bfc98 in S_reg (pRExC_state=0x7fffffffd680, paren=0, flagp=0x7fffffffd3a0, depth=1) at regcomp.c:12184
#15 0x00005555556a31d0 in Perl_re_op_compile (patternp=0x0, pat_count=1, expr=0x555555c42700, eng=0x555555c0ad20 <PL_core_reg_engine>, old_re=0x0,
is_bare_re=0x0, orig_rx_flags=0, pm_flags=0) at regcomp.c:7835
#16 0x00005555555be959 in Perl_pmruntime (o=0x555555c42738, expr=0x555555c42700, repl=0x0, flags=1, floor=0) at op.c:8336
#17 0x0000555555672dc3 in Perl_yyparse (gramtype=258) at perly.y:1293
#18 0x00005555555efa04 in S_parse_body (env=0x0, xsinit=0x5555555a21ff <xs_init>) at perl.c:2574
#19 0x00005555555eddf0 in perl_parse (my_perl=0x555555c15260, xsinit=0x5555555a21ff <xs_init>, argc=3, argv=0x7fffffffe1c8, env=0x0) at perl.c:1869
#20 0x00005555555a213d in main (argc=3, argv=0x7fffffffe1c8, env=0x7fffffffe1e8) at perlmain.c:132

This is a regression in blead, bisect points to 4829f32 is the first bad commit

commit 4829f32decd128e6a122bd8ce35fe944bd87f104
Author: Karl Williamson <[email protected]>
Date:   Sat Feb 15 17:39:00 2020 -0700

    Restrict features in wildcards

    The algorithm for dealing with Unicode property wildcards is to wrap the
    user-supplied pattern with /miaa.  We don't want the user to be able to
    override the /m and /aa parts.  Modifiers that are only specifiable as a
    modifier in a qr or similar op (like /gc) can't be included in things
    like (?gc).  These normally incur a warning that they are ignored, but
    the texts of those warnings are misleading when using wildcards, so I
    chose to just make them illegal.  Of course that could be changed to
    having custom useful warning texts, but I didn't think it was worth it.

    I also chose to forbid recursion of using nested \p{}, just from fear
    that it might lead to issues down the road, and it really isn't useful
    for this limited universe of strings to match against.  Because
    wildcards currently can't handle '}' inside them, only the single letter
    \p,\P are valid anyway.

    Similarly, I forbid the '*' quantifier to make it harder for the
    constructed subpattern to take forever to make any progress and decide
    to halt.  Again, using it would be overkill on the universe of possible
    match strings.

[Please do not change anything below this line]
Flags:
category=core
severity=high
Site configuration information for perl 5.31.10:

Configured by root at Fri Mar 13 17:15:02 MSK 2020.

Summary of my perl5 (revision 5 version 31 subversion 10) configuration:
Commit id: 0c96aa4
Platform:
osname=linux
osvers=4.19.0-8-amd64
archname=x86_64-linux
uname='linux dorothy 4.19.0-8-amd64 #1 smp debian 4.19.98-1 (2020-01-26) x86_64 gnulinux '
config_args='-de -Dusedevel -Doptimize=-O2'
hint=recommended
useposix=true
d_sigaction=define
useithreads=undef
usemultiplicity=undef
use64bitint=define
use64bitall=define
uselongdouble=undef
usemymalloc=n
default_inc_excludes_dot=define
bincompat5005=undef
Compiler:
cc='cc'
ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2'
optimize='-O2'
cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
ccversion=''
gccversion='8.3.0'
gccosandvers=''
intsize=4
longsize=8
ptrsize=8
doublesize=8
byteorder=12345678
doublekind=3
d_longlong=define
longlongsize=8
d_longdbl=define
longdblsize=16
longdblkind=3
ivtype='long'
ivsize=8
nvtype='double'
nvsize=8
Off_t='off_t'
lseeksize=8
alignbytes=8
prototype=define
Linker and Libraries:
ld='cc'
ldflags =' -fstack-protector-strong -L/usr/local/lib'
libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/8/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib
libs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
libc=libc-2.28.so
so=so
useshrplib=false
libperl=libperl.a
gnulibc_version='2.28'
Dynamic Linking:
dlsrc=dl_dlopen.xs
dlext=so
d_dlsymun=undef
ccdlflags='-Wl,-E'
cccdlflags='-fPIC'
lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'

@inc for perl 5.31.10:
lib
/usr/local/lib/perl5/site_perl/5.31.10/x86_64-linux
/usr/local/lib/perl5/site_perl/5.31.10
/usr/local/lib/perl5/5.31.10/x86_64-linux
/usr/local/lib/perl5/5.31.10

Environment for perl 5.31.10:
HOME=/home/afl
LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_CTYPE=en_US.UTF-8
LC_TIME=C
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/afl/perlbrew/bin:/home/afl/perlbrew/perls/perl-5.30.0-dbg/bin:/opt/local/bin:/usr/texbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PERLBREW_HOME=/home/afl/.perlbrew
PERLBREW_MANPATH=/home/afl/perlbrew/perls/perl-5.30.0-dbg/man
PERLBREW_PATH=/home/afl/perlbrew/bin:/home/afl/perlbrew/perls/perl-5.30.0-dbg/bin
PERLBREW_PERL=perl-5.30.0-dbg
PERLBREW_ROOT=/home/afl/perlbrew
PERLBREW_SHELLRC_VERSION=0.88
PERLBREW_VERSION=0.88
PERL_BADLANG (unset)

@khwilliamson
Copy link
Contributor

This is the result of \A being marked as matching a single character, which of course it doesn't.

I was planning to fix this in early 5.33. Is that sufficiently soon?

@dur-randir
Copy link
Member Author

Since it's experimental, I don't think it's too bad leaving it as-is, but maybe it deserves a note somewhere in caveats section.

@khwilliamson
Copy link
Contributor

FWIW, I suspect there are patterns not using the experimental feature that could cause this to happen.

@khwilliamson khwilliamson removed this from the 5.32.0 milestone Apr 1, 2020
@dur-randir
Copy link
Member Author

It's bad, since I've blacklisted this particular output from results, so I can't tell if I see it on other patterns or not.

@khwilliamson
Copy link
Contributor

Do you want me to give you a branch with this fixed so you can see if you can get it on non-experimental paths?

@dur-randir
Copy link
Member Author

Sure.

khwilliamson added a commit that referenced this issue Apr 26, 2020
@khwilliamson
Copy link
Contributor

While looking at the code, I retract my assertion that non-wildcard could have this. But this branch has it fixed: smoke-me/khw-randir

@hvds
Copy link
Contributor

hvds commented Apr 26, 2020

@khwilliamson that looks near identical to the first commit in smoke-me/hv/gh17594.

@dur-randir
Copy link
Member Author

I've restarted the fuzzer.

@khwilliamson
Copy link
Contributor

@hvds, ok. This was just to get @dur-randir going again.

@khwilliamson khwilliamson added this to the 5.32.0 milestone Apr 26, 2020
khwilliamson added a commit that referenced this issue Apr 26, 2020
The reason this bug occurs is that wildcard matching changes the anchor
assertions \A, \Z, and \z, without corresponding changes in regexec.c.

We earlier noticed that all these were being marked SIMPLE, and a
zero-width construct shouldn't really be.  But it was considered too
late in the development cycle to make that change.  So the plan was to
live with this bug in an experimental feature in 5.32.

But I eventually realized that the change could be effected for just the
wildcard versions, and this commit does that.  If there is some issue
with making these non-SIMPLE, it will affect only the wildcard feature,
and those potential bugs are better than a known bug.  I also seems
unlikely that this will introduce any bug.  What removing SIMPLE does is
merely remove potential optimizations in the handling.  The most general
case should work.�; it's doing an improper optimization that gets one
into trouble.

This fixes #17677
khwilliamson added a commit that referenced this issue Apr 29, 2020
The reason this bug occurs is that wildcard matching changes the anchor
assertions \A, \Z, and \z, without corresponding changes in regexec.c.

We earlier noticed that all these were being marked SIMPLE, and a
zero-width construct shouldn't really be.  But it was considered too
late in the development cycle to make that change.  So the plan was to
live with this bug in an experimental feature in 5.32.

But I eventually realized that the change could be effected for just the
wildcard versions, and this commit does that.  If there is some issue
with making these non-SIMPLE, it will affect only the wildcard feature,
and those potential bugs are better than a known bug.  I also seems
unlikely that this will introduce any bug.  What removing SIMPLE does is
merely remove potential optimizations in the handling.  The most general
case should work.�; it's doing an improper optimization that gets one
into trouble.

This fixes #17677
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants