Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in DumpCompressedString / pglz_decompress #20

Open
xinjijia opened this issue Oct 20, 2023 · 3 comments
Open

Crash in DumpCompressedString / pglz_decompress #20

xinjijia opened this issue Oct 20, 2023 · 3 comments

Comments

@xinjijia
Copy link

pg_filedump.zip

pg_filedump -D int,varchar,xml,text,text -t -o 21359 >21359act_evt_log.txt

The file is normal, and this is the test data I did. I found that the - t parameter, taking field values from the toast file, causes the program to crash.

@df7cb
Copy link
Owner

df7cb commented Jun 4, 2024

Crashes here as well:

Core was generated by `postgresql-filedump.git/pg_filedump -D int,varchar,xml,text,text -o -t 20/21359'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055aefc4a73c0 in pglz_decompress ()
(gdb) bt
#0  0x000055aefc4a73c0 in pglz_decompress ()
#1  0x000055aefc4a5357 in DumpCompressedString (parse_value=0x10f6, compressed_size=4342, 
    data=0x55aefcb8ded0 "\305\343\341", <incomplete sequence \332>) at decode.c:1309
#2  ReadStringFromToast (buffer=<optimized out>, parse_value=parse_value@entry=0x55aefc4a4e60 <CopyAppendEncode>, 
    out_size=0x7ffcbda437d4, buff_size=<optimized out>) at decode.c:1427
#3  0x000055aefc4a60b7 in extract_data (buffer=<optimized out>, buff_size=<optimized out>, out_size=0x7ffcbda437d4, 
    parse_value=0x55aefc4a4e60 <CopyAppendEncode>) at decode.c:1124
#4  0x000055aefc4a64da in FormatDecode (tupleData=0x55aefcb8b7c8 "\367\t", tupleSize=<optimized out>)
    at decode.c:1277
#5  0x000055aefc4a3c36 in FormatItemBlock (toastRead=<synthetic pointer>, toastValue=<optimized out>, 
    toastExternalSize=<optimized out>, toastOid=<optimized out>, isToast=<optimized out>, page=<optimized out>, 
    buffer=<optimized out>) at pg_filedump.c:1402
#6  FormatBlock (controlOptions=<optimized out>, toastRead=<synthetic pointer>, toastValue=<optimized out>, 
    toastExternalSize=<optimized out>, toastOid=<optimized out>, isToast=false, blockSize=8192, currentBlock=0, 
    buffer=<optimized out>, blockOptions=<optimized out>) at pg_filedump.c:1957
#7  DumpFileContents (blockOptions=896, controlOptions=0, fp=0x55aefcb892a0, blockSize=8192, 
    blockStart=<optimized out>, blockEnd=-1, isToast=false, toastOid=0, toastExternalSize=0, toastValue=0x0)
    at pg_filedump.c:2267
#8  0x000055aefc4a055e in main (argv=6, argc=0x7ffcbda43b38) at pg_filedump.c:2404

@df7cb df7cb changed the title Hello, there is a problem with the TOAST value Crash in DumpCompressedString / pglz_decompress Jun 4, 2024
@df7cb
Copy link
Owner

df7cb commented Jun 7, 2024

This specific case seems to be fixed by 6d81d28, but there's still a bug in the detoasting code that I could not yet track down because I got lost several times in the maze of the different VARLENA_* VARDATA_* TOAST_* macros, decoding functions calling each other in several different code paths and duplicated compression routines sharing some but not all code.

https://github.com/df7cb/pg_filedump/actions/runs/9419585323/job/25949658152#step:7:89

@GetsuDer perhaps you can spot it?

@GetsuDer
Copy link
Contributor

After some time with gdb, I (i hope) has found out what the problem is. The issue is that toasted data we read with DumpFileContents has its chunks in wrong order, since we don't use index on (chunk id, chunk seq) as postgres code does, and just get all chunks as they are in file.

Here is the partial output I got out of running failing test with -v option:

Item   7 -- Length:   66  Offset: 7456 (0x1d20)  Flags: NORMAL
  TOAST value. Raw size:   700004, external size:     8035, value id:  16392, toast relation id:  16387, chunks:      5

	Block    0 ********************************************************
	<Header> -----
	 Block Offset: 0x00000000         Offsets: Lower      44 (0x002c)
	 Block: Size 8192  Version    4            Upper    1928 (0x0788)
	 LSN:  logid      0 recoff 0x0156f450      Special  8192 (0x2000)
	 Items:    5                      Free Space: 1884
	 Checksum: 0x0000  Prune XID: 0x00000000  Flags: 0x0004 (ALL_VISIBLE)
	 Length (including item array): 44

	<Data> -----
	 Item   1 -- Length: 2032  Offset: 6160 (0x1810)  Flags: NORMAL
	  Read TOAST chunk. TOAST Oid: 16392, chunk id: 0, chunk data size: 0
	 Item   2 -- Length:  840  Offset: 5320 (0x14c8)  Flags: NORMAL
	  Read TOAST chunk. TOAST Oid: 16392, chunk id: 0, chunk data size: 0
	 Item   3 -- Length: 2032  Offset: 3288 (0x0cd8)  Flags: NORMAL
	  Read TOAST chunk. TOAST Oid: 16392, chunk id: 0, chunk data size: 0
	 Item   4 -- Length: 1266  Offset: 2016 (0x07e0)  Flags: NORMAL
	  Read TOAST chunk. TOAST Oid: 16392, chunk id: 0, chunk data size: 0
	 Item   5 -- Length:   87  Offset: 1928 (0x0788)  Flags: NORMAL
	  Read TOAST chunk. TOAST Oid: 16392, chunk id: 4, chunk data size: 51

	Block    1 ********************************************************
	<Header> -----
	 Block Offset: 0x00002000         Offsets: Lower      40 (0x0028)
	 Block: Size 8192  Version    4            Upper      64 (0x0040)
	 LSN:  logid      0 recoff 0x0156f398      Special  8192 (0x2000)
	 Items:    4                      Free Space:   24
	 Checksum: 0x0000  Prune XID: 0x00000000  Flags: 0x0004 (ALL_VISIBLE)
	 Length (including item array): 40

	<Data> -----
	 Item   1 -- Length: 2032  Offset: 6160 (0x1810)  Flags: NORMAL
	  Read TOAST chunk. TOAST Oid: 16392, chunk id: 0, chunk data size: 1996
	 Item   2 -- Length: 2032  Offset: 4128 (0x1020)  Flags: NORMAL
	  Read TOAST chunk. TOAST Oid: 16392, chunk id: 1, chunk data size: 1996
	 Item   3 -- Length: 2032  Offset: 2096 (0x0830)  Flags: NORMAL
	  Read TOAST chunk. TOAST Oid: 16392, chunk id: 2, chunk data size: 1996
	 Item   4 -- Length: 2032  Offset:   64 (0x0040)  Flags: NORMAL
	  Read TOAST chunk. TOAST Oid: 16392, chunk id: 3, chunk data size: 1996

pg_filedump even prints the chunk id's here for us. So in readStringFromToast we got all toasted data in one (compressed) buffer - toast_data, but in order (chunk 4, chunk 0, chunk 1, chunk 2, chunk 3). I checked with gdb, and if starting with chunk 0, the data parses correctly and everything is fine.

Unfortunately, I can't see fast and easy fix here, and will be quite busy in the nearest time, but I hope to return to this issue as soon as I can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants