Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐞 [Certain files compressed with ZPAQ fail to compress successfully] #126

Closed
Merculous opened this issue Jun 28, 2023 · 5 comments
Closed
Assignees

Comments

@Merculous
Copy link

Merculous commented Jun 28, 2023

lrzip-next Version

lrzip-next version 0.11.2-2-b936f23

lrzip-next command line

lrzip-next -T -z -L1 -vv "$@"

What happened?

Version 0.11.2-2-b936f23 produces corrupt files when compressing with ZPAQ.

Versions from before did compress just fine, I believe this is a issue from latest changes to ZPAQ itself. LZMA is the only other compression method I use and I've never had any issues with it.

What was expected behavior?

Should've compressed. Nothing to really add here.

Steps to reproduce

Download this file: https://drive.google.com/file/d/19VQf8fceSAFU_WewOeo8nt3Svsdj77TW/view?usp=sharing

File attached is a tar archive of a slightly modified version of https://github.com/acidanthera/OpenCorePkg/tree/master/Utilities/macrecovery but with the dmg's included.

It is roughly 6GB in total, no compression is used.

Use lrzip-next -T -z -L1 -vv macrecovery.tar

OR, for a much smaller test case:

clone my repo: https://github.com/Merculous/Jailbreak-Programs
(HEAD is only 471MB so this is probably a more suitable test case)

Extract all 7z archives, put all of the extracted archive contents into a folder, tar, then run the same command from above.

Relevant log output

% lrzip-next-zpaq-fastest macrecovery.tar
The following options are in effect for this COMPRESSION.
Threading is ENABLED. Number of CPUs detected: 24
Detected 135,153,250,304 bytes ram
Nice Value: 19
Show Progress
Max Verbose
Temporary Directory set as: /tmp/
Compression mode is: ZPAQ. LZ4 Compressibility testing disabled
Compression level 1
RZIP Compression level 1
ZPAQ Compression Level: 1, ZPAQ initial Block Size: 4
MD5 Hashing Used
Heuristically Computed Compression Window: 859 = 85,900MB
Storage time in seconds 1,392,375,825
File size: 5,667,072,000
Succeeded in testing 5,667,072,000 sized mmap for rzip pre-processing
Will take 1 pass
Chunk size: 5,667,072,000
Byte width: 5
Per Thread Memory Overhead is 150,994,944
Succeeded in testing 9,441,945,600 sized malloc for back end compression
Using up to 25 threads to compress up to 16,773,120 bytes each.
Beginning rzip pre-processing phase
hashsize = 131,072.  bits = 17. 2MB
Starting sweep for mask 15
Starting sweep for mask 31
Starting sweep for mask 63
Starting sweep for mask 127
Starting thread 0 to compress 16,773,120 bytes from stream 1
Starting zpaq backend compression thread 0...
ZPAQ: Method selected: 14,128,0: level=1, bs=4, redundancy=128, type=binary/random
Starting sweep for mask 255
Starting thread 1 to compress 16,773,120 bytes from stream 1
Starting zpaq backend compression thread 1...
ZPAQ: Method selected: 14,128,0: level=1, bs=4, redundancy=128, type=binary/random
Starting sweep for mask 511
Starting thread 2 to compress 16,773,120 bytes from stream 1
Starting zpaq backend compression thread 2...
ZPAQ: Method selected: 14,128,0: level=1, bs=4, redundancy=128, type=binary/random

Removing some of the log due to max char len is 16 bits :/

ZPAQ Thread 22 Completed
Starting thread 10 to compress 16,773,120 bytes from stream 1
Starting zpaq backend compression thread 10...
ZPAQ: Method selected: 14,128,0: level=1, bs=4, redundancy=128, type=binary/random
ZPAQ Thread 21 Completed
Compthread 21 seeking to 2,409,239,356 to store length 5
Compthread 21 seeking to 2,424,821,375 to write header
Thread 21 writing 16,454,936 compressed bytes from stream 1
Compthread 21 writing data at 2,424,821,391
Compthread 22 seeking to 2,424,821,386 to store length 5
Compthread 22 seeking to 2,441,276,327 to write header
Thread 22 writing 16,585,893 compressed bytes from stream 1
Compthread 22 writing data at 2,441,276,343
Total: 49%  Chunk: 49%
Starting thread 11 to compress 16,773,120 bytes from stream 1
Starting zpaq backend compression thread 11...
ZPAQ: Method selected: 14,128,0: level=1, bs=4, redundancy=128, type=binary/random
Incompressible block
double free or corruption (!prev)
Aborted (core dumped)

Please provide system details

OS Distro:
Ubuntu 22.04.2 LTS x86_64

Kernel Version (uname -a):
Linux server 5.15.0-75-generic #82-Ubuntu SMP Tue Jun 6 23:10:23 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

System ram (free -h):
total used free shared buff/cache available
Mem: 125Gi 1.0Gi 77Gi 22Mi 46Gi 123Gi
Swap: 8.0Gi 0.0Ki 8.0Gi

Additional Context

free(): invalid size6%
[1] 2572128 IOT instruction (core dumped) lrzip-next -T -z -L2 -vv macrecovery.tar

munmap_chunk(): invalid pointer
[1] 3147239 IOT instruction (core dumped) lrzip-next -T -z -L3 -vv macrecovery.tar

lrzip-next -T -z -L4 -vv macrecovery.tar
Write of length 33,550,336 failed - Bad address
Failed to write_buf s_buf in compthread 11
Deleting broken file macrecovery.tar.lrz
Fatal error - exiting

Here's the sha256 of macrecovery.tar: bb3f87028529f7f5c2eb730d198a2b44a207aa6702c10f4e2938cc8e11a29103

I believe 5-9 produce the same issue as 4. I don't really want to wait a long time just to get same results so that's all I'm doing for now (I'm very tired as I'm writing this).

I hope this is enough info to start searching for bugs. Thanks in advanced!

Update: I've tested commit 94891e5 and compressed with the same options used for both tar archives mentioned, in this case with the tar'd contents from Jailbreak-Programs. It had successfully compressed. So, I'm assuming 2f84b91 may have broken compression in certain cases.

@pete4abw
Copy link
Owner

Thank you for the report. When I uncovered the bug that kept levels 1 and 2 of zpaq from working, it allowed for faster zpaq compression because all 5 zpaq levels could be used. That was lrzip-next v 0.11.1. Next, I changed the way the zpaq compression was called -- using compressBlock() and not compress(). The reason being that compress limited the block size of the area compressed whereas compressBlock did not. This may take a little work to figure and I'm hoping a regression to using a smaller block size won't be necessary. Prior to allowing zpaq levels 1 and 2, lrzip-next level 1 was actually using zpaq level 3. Since at the time, levels 3, 4, and 5 were the only levels that worked!

So, first step is to test with lrzip-next 0.11.1 and go from there! I think any file with an incompressible block will do. Thank you again!

cf: Fix Zpaq levels 1 and 2, and Maximize block size for zpaq

@pete4abw
Copy link
Owner

Hello again. Using the -T option disables a brief threshold test for block data. The failure is occurring in the zpaq code. What I think is going on is that the "compressed" buffer is actually larger than the source buffer. It may be causing a buffer overrun. I tested this all the way back to version 0.10.2. I also test it with lrzip version 0.651 and it also failed. lrzip uses an older zpaq API but the problem is there too. @ckolivas

Since zpaq is no longer maintained by @zpaq and the @fcorbelli version is not official, it may take some time to fix this. So, in the meantime, I recommend NOT using -T option while I figure out what to do. Will leave this open for the time being.

Suggestions for a workaround or help fixing the zpaq bug is appreciated!

@fcorbelli
Copy link

I am afraid that there will be no more official version of zpaq

At least every time I asked for help from Dr. Mahoney, very kind actually, I could not get any concrete guidance.
Maybe he will make some changes to the code in the future.
But I doubt very much

@pete4abw
Copy link
Owner

So after some testing, it appears that the StringBuffer class has to be set and its functions called with a set block size. When the block is incompressible yet greater than the block size, some error occurs. I don't have the time to try and fix dead code. But what I can do is revert to the old way of compressing. Calling compress() and have it parse blocks one by one until done. This method does not seem to cause any problem when the -T option is passed to lrzip-next. But it does yield slightly poorer compression. I also am testing a small change to zpaq_compress_buf, similar to what I did with lzma. Make the destination buffer 2% larger than the buffer to compress. This adds a cushion for when zpaq fails and returns a larger block than it was sent because it was not compressible.

THAT SAID, best practice would suggest that using a compression threshold saves both time and possible problems when the compression routine returns a block larger than sent! Using -T90 or -T95 is good.

There are two possible resolutions.

  1. Inhibit the use of -T with zpaq
  2. Revert to the old way of compressing

Using threshold testing is not perfect as it does not review the entire chunk of data, just the beginning of it. So, forcing data testing prior to backend compression will not work 100% of the time, especially if the incompressible data is buried in the middle somewhere!

I'll push changes soon. We'll see how it goes!

@Merculous
Copy link
Author

Merculous commented Jun 29, 2023

I wouldn't revert zpaq compression to what it was previously, I'd rather see something different with zpaq that zpaq doesn't already use (I always use zpaq, sometimes this fork too). Compression is already good with lrzip in a lot of cases, so if anything that speeds up compression with a slight sacrifice in compression is usually the best way to go in my opinion.

pete4abw added a commit that referenced this issue Jun 29, 2023
  Addresses #126 reported by @Merculous.
  Zpaq compression would fail when `-T` option used on a block
  that was incompressible. There was some buffer overrun that
  caused the failure. Howver debugging dead code is out of scope.

  Error from lrzip-next 0.11.2 and earlier
  ========================================
  Incompressible block
  double free or corruption (!prev)
  Aborted (core dumped)

  Revert to calling compress() function and let it handle calling
  compressBlock incrementally as it sends one block size block of
  data at a time.

  This possibly would also occur if any data filter is used since
  it also would bypass threhold testing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants