Ignore LZMA_BUF_ERROR #18

genail · 2018-07-15T13:05:16Z

It shouldn't be treated as an error since it is produced when liblzma cannot generate any output yet. The documentation is a little unclear on that matter but this errors seems to be repetitive on some data inputs and ignoring it does not do any harm.

	LZMA_BUF_ERROR          = 10,
		/**<
		 * \brief       No progress is possible
		 *
		 * This error code is returned when the coder cannot consume
		 * any new input and produce any new output. The most common
		 * reason for this error is that the input stream being
		 * decoded is truncated or corrupt.
		 *
		 * This error is not fatal. Coding can be continued normally
		 * by providing more input and/or more output space, if
		 * possible.
		 *
		 * Typically the first call to lzma_code() that can do no
		 * progress returns LZMA_OK instead of LZMA_BUF_ERROR. Only
		 * the second consecutive call doing no progress will return
		 * LZMA_BUF_ERROR. This is intentional.
		 *
		 * With zlib, Z_BUF_ERROR may be returned even if the
		 * application is doing nothing wrong, so apps will need
		 * to handle Z_BUF_ERROR specially. The above hack
		 * guarantees that liblzma never returns LZMA_BUF_ERROR
		 * to properly written applications unless the input file
		 * is truncated or corrupt. This should simplify the
		 * applications a little.
		 */
```

It shouldn't be treated as error since it is produced when liblzma cannot generate any output yet.

Quintus · 2018-07-26T07:11:40Z

I'm not entirely sure if completely ignoring the error is a good idea. It could silently truncate the decompressed data. If liblzma can't continue, then liblzma is probably not outputting any data in lzma_code(), and thus causing XZ::lzma_code to terminate with no appearent error.

The docs you quoted explicitely say that one shouldn't usually come over this error unless the input data is corrupt. If the input data is corrupt, the correct way is to signal an exception to prevent silent data corruption.

Maybe it's thus better to instead leave the decision on what to do with LZMA_BUF_ERROR to the user in form of an additional argument or global option on the XZ module? It would default to raising, but if the user knows what he's doing, he could choose to ignore it.

Quintus · 2018-07-26T07:15:08Z

Mh, little mistake on my side. I was talking about this line, but it of course only terminates the inner loop. If lzma_code() doesn't fill the output buffer, XZ::lzma_code is going to feed it the next input, if any. So there wouldn't be truncation, but it still feels wrong to silently eat the error code. I'd still say there should be a way for the user to configure what to do.

genail · 2018-08-01T19:25:54Z

Thank you for looking into my PR.

Yeah, it kinda felt wrong, but I'm not sure if I understand that encoded stream is "truncated". I'm decoding large portions of data (tens of gigabytes) and it happens to me quite often. Yes, my stream is truncated, but this is how streams are working, duh! Maybe I am doing something wrong in the first place?

What I'm doing exactly is processing a stream through AES decipher, lzma library, and writing the result in a file. The buffer size passed to lzma decoder is rather random (its size depends of AES decipher output). Is there a rule that I'm missing maybe, like passing 4kb chunks or something like that?

If you don't know any, then I'll think of a configuration option.

Quintus · 2018-08-26T16:19:14Z

I'm sorry for being so terribly slow.

Is there a rule that I'm missing maybe, like passing 4kb chunks or something like that?

There’s none. liblzma only requires to tell how large your input and output buffers are.

If you don't know any, then I'll think of a configuration option.

That'd be nice. If you find a good way, feel free to file it as a PR again. Otherwise I'm going to look into the topic myself.

paulvt · 2019-10-02T14:16:08Z

This issue can even be triggered when streaming a tar to the stream writer as well! Sometimes either empty data can passed twice, or a block of NULL bytes can be passed passed (part of the tail of the tar file), both leading to the LZMA_BUF_ERROR unnecessarily leading to an exception raise.

Attached is a tar file that cannot be compressed when passing it in chunks (simulating the tarring process). For this you need to gunzip the attached tarball first.

XZ::StreamWriter.open('/path/to/tmp/test2.tar.xz', external_encoding: 'binary') do |txz|
  File.open("/path/to/tmp/test2.tar", "rb") do |file|
    while chunk = file.read(4096)
      txz.write(chunk)
    end
  end
end
# => XZ::LZMAError: Buffer unusable!

Also, when using the basic example from the homepage that does streaming tar packing and compressing, the data cannot be compressed. For this you need to extract the tarball to some location first.

XZ::StreamWriter.open('/path/to/tmp/test2.tar.xz', external_encoding: 'binary') do |txz|
  Minitar.pack('/path/to/tmp/test2', txz)
end

Note that this issue was also fixed in the Python LZMA module for the same reason:
https://bugs.python.org/issue27517

Attached: test2.tar.gz

Quintus · 2021-11-07T09:46:43Z

I have decided to abaondon this project due to lack of time to maintain it, and I don’t use it anymore. Hence, I close this issue now. Please fork if you want to continue maintaining it.

Ignore LZMA_BUF_ERROR

99e7d3c

It shouldn't be treated as error since it is produced when liblzma cannot generate any output yet.

paulvt added a commit to LeftClickBV/ruby-xz that referenced this pull request Oct 2, 2019

Don't raise for LZMA_BUF_ERROR (see Quintus#18)

20de90e

Quintus closed this Nov 7, 2021

win93 mentioned this pull request Nov 13, 2021

Investigate the inclusion of PRs to the previous repo win93/ruby-xz#4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ignore LZMA_BUF_ERROR #18

Ignore LZMA_BUF_ERROR #18

genail commented Jul 15, 2018

Quintus commented Jul 26, 2018

Quintus commented Jul 26, 2018

genail commented Aug 1, 2018

Quintus commented Aug 26, 2018

paulvt commented Oct 2, 2019

Quintus commented Nov 7, 2021

Ignore LZMA_BUF_ERROR #18

Ignore LZMA_BUF_ERROR #18

Conversation

genail commented Jul 15, 2018

Quintus commented Jul 26, 2018

Quintus commented Jul 26, 2018

genail commented Aug 1, 2018

Quintus commented Aug 26, 2018

paulvt commented Oct 2, 2019

Quintus commented Nov 7, 2021