Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heap-Buffer-Overflow in pcre2(src/pcre2test.c:2945:7 in utf82ord) #235

Closed
longuu9 opened this issue Apr 18, 2023 · 3 comments · Fixed by #237
Closed

Heap-Buffer-Overflow in pcre2(src/pcre2test.c:2945:7 in utf82ord) #235

longuu9 opened this issue Apr 18, 2023 · 3 comments · Fixed by #237

Comments

@longuu9
Copy link

longuu9 commented Apr 18, 2023

We found a heap-buffer-overflow in pcre2-10.43-DEV(src/pcre2test.c:2945:7 in utf82ord),which can also be reproduced on pcre2-10.42.

Command Input

pcre2test -d poc_file /dev/null

poc_file are attached.

Sanitizer Dump

==758640==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x629000004200 at pc 0x0000004fa2a9 bp 0x7fff943b5ea0 sp 0x7fff943b5e98
READ of size 1 at 0x629000004200 thread T0
    #0 0x4fa2a8 in utf82ord /root/target/Invariants/pcre2/src/pcre2test.c:2945:7
    #1 0x4d4283 in pchars8 /root/target/Invariants/pcre2/src/pcre2test.c:3062:14
    #2 0x4ebd29 in process_data /root/target/Invariants/pcre2/src/pcre2test.c:8067:5
    #3 0x4cef20 in main /root/target/Invariants/pcre2/src/pcre2test.c:9470:12
    #4 0x7f290e11c082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:308:16
    #5 0x41c35d in _start (/root/target/Invariants/pcre2/pcre2test+0x41c35d)

0x629000004200 is located 0 bytes to the right of 16384-byte region [0x629000000200,0x629000004200)
allocated by thread T0 here:
    #0 0x4978d9 in realloc /root/test/fuzzing_python/llvm-project-llvmorg-12.0.0/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
    #1 0x4de0d0 in process_data /root/target/Invariants/pcre2/src/pcre2test.c:6867:24
    #2 0x4cef20 in main /root/target/Invariants/pcre2/src/pcre2test.c:9470:12
    #3 0x7f290e11c082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:308:16

SUMMARY: AddressSanitizer: heap-buffer-overflow /root/target/Invariants/pcre2/src/pcre2test.c:2945:7 in utf82ord
Shadow bytes around the buggy address:
  0x0c527fff87f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c527fff8800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c527fff8810: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c527fff8820: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c527fff8830: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c527fff8840:[fa]fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c527fff8850: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c527fff8860: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c527fff8870: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c527fff8880: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c527fff8890: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==758640==ABORTING

Environment

  • OS: Ubuntu 20.04.1
  • clang:12.0.0
  • pcre2:pcre2-10.43-DEV

we built pcre2 with AddressSanitizer (ASAN) .

./configure CC=clang CXX=clang++ CFLAGS='-g -O0 -fsanitize=address' CXXFLAGS='-g -O0 -fsanitize=address' --disable-shared

pcre2-10.43-DEV configuration summary:

    Install prefix ..................... : /usr/local
    C preprocessor ..................... : clang -E
    C compiler ......................... : clang
    Linker ............................. : /usr/bin/ld -m elf_x86_64
    C preprocessor flags ............... : 
    C compiler flags ................... : -g -O0 -fsanitize=address -fvisibility=hidden
    Linker flags ....................... : 
    Extra libraries .................... : 

poc_file.zip

@carenas
Copy link
Contributor

carenas commented Apr 19, 2023

a somehow minified replication is (note that skipping utf validation and providing an invalid subject is not supported, but it doesn't seem to trigger the bug):

PCRE2 version 10.43-DEV 2023-04-14 (8-bit)
  re> /(?<=..)X/match_invalid_utf
data> XX\xef\=ph

but only when doing a partial match and is triggered by a bug in a function that is only used internally by pcre2test, and that likely become vulnerable (to a read overflow) when matching of invalid UTF was added in 10.34, as it is not affected if using the regular utf mode.

PCRE2 version 10.34 2019-11-21
  re> /(?<=..)X/utf
data> XX\xef\=ph
Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2

pcre2test is also mostly a development tool, so it will be better to focus on issues in the library; therefore fuzzing through it is not ideal.

@PhilipHazel
Copy link
Collaborator

This is obviously just another instance of using no_utf_check with invalid UTF-8, which is documented to give undefined behaviour.

@carenas
Copy link
Contributor

carenas commented Apr 19, 2023

I am afraid it is actually a real bug in the utf82ord() function inside pcretest.c which needs updating to deal with invalid UTF (which as you said was not expected, and normally shouldn't happen unless no_utf_check was used), which is not the case here.

I added a way to know the size of the buffer it is using, so it will be able to better detect it was passed invalid utf and do not read past its buffer to decode an UTF character that is not really there but there seem to be other issues with it that will need addressing independently as shown in:

data> XX\xef\=ph
Partial match: \x{ef}
** ovector[1] is not equal to the subject length: 2 != 3

Although the partial match bug might be there already as well as shown by:

PCRE2 version 10.34 2019-11-21
  re> /(?<=..)X/match_invalid_utf,allvector
data> XX\x80\=ph,ovector=1
Partial match: \x{80}
** ovector[1] is not equal to the subject length: 2 != 3
 0: 2 2

carenas added a commit to carenas/pcre2 that referenced this issue Apr 20, 2023
When match_invalid_utf is enabled, invalid UTF-8 data can't match
but it was mistakenly getting printed as part of a partial match
eventhough the ovector correctly didn't include it, as shown by:

  PCRE2 version 10.34 2019-11-21
    re> /(?<=..)X/match_invalid_utf,allvector
  data> XX\x80\=ph,ovector=1
  Partial match: \x{80}
  ** ovector[1] is not equal to the subject length: 2 != 3
   0: 2 2

Fix the logic to print instead the empty match that was returned
and as a side effect avoid a buffer overread when trying to decode
UTF-8 that was missing code units.

Fixes: PCRE2Project#235
carenas added a commit to carenas/pcre2 that referenced this issue Apr 20, 2023
When match_invalid_utf is enabled, invalid UTF-8 data can't match
but it was mistakenly getting printed as part of a partial match
eventhough the ovector correctly didn't include it, as shown by:

  PCRE2 version 10.34 2019-11-21
    re> /(?<=..)X/match_invalid_utf,allvector
  data> XX\x80\=ph,ovector=1
  Partial match: \x{80}
  ** ovector[1] is not equal to the subject length: 2 != 3
   0: 2 2

Fix the logic to print instead the empty match that was returned
and as a side effect avoid a buffer overread when trying to decode
UTF-8 that was missing code units.

Fixes: PCRE2Project#235
carenas added a commit to carenas/pcre2 that referenced this issue Apr 20, 2023
When match_invalid_utf is enabled, invalid UTF-8 data can't match
but it was mistakenly getting printed as part of a partial match
eventhough the ovector correctly didn't include it, as shown by:

  PCRE2 version 10.34 2019-11-21
    re> /(?<=..)X/match_invalid_utf,allvector
  data> XX\x80\=ph,ovector=1
  Partial match: \x{80}
  ** ovector[1] is not equal to the subject length: 2 != 3
   0: 2 2

Fix the logic to print instead the empty match that was returned
and address a buffer overread when trying to decode UTF-8 that was
missing code units.

Fixes: PCRE2Project#235
PhilipHazel pushed a commit that referenced this issue Apr 21, 2023
When match_invalid_utf is enabled, invalid UTF-8 data can't match
but it was mistakenly getting printed as part of a partial match
eventhough the ovector correctly didn't include it, as shown by:

  PCRE2 version 10.34 2019-11-21
    re> /(?<=..)X/match_invalid_utf,allvector
  data> XX\x80\=ph,ovector=1
  Partial match: \x{80}
  ** ovector[1] is not equal to the subject length: 2 != 3
   0: 2 2

Fix the logic to print instead the empty match that was returned
and address a buffer overread when trying to decode UTF-8 that was
missing code units.

Fixes: #235
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants