Skip to content
This repository has been archived by the owner on Nov 14, 2021. It is now read-only.

Ignore LZMA_BUF_ERROR #18

Closed

Conversation

genail
Copy link
Contributor

@genail genail commented Jul 15, 2018

It shouldn't be treated as an error since it is produced when liblzma cannot generate any output yet. The documentation is a little unclear on that matter but this errors seems to be repetitive on some data inputs and ignoring it does not do any harm.

	LZMA_BUF_ERROR          = 10,
		/**<
		 * \brief       No progress is possible
		 *
		 * This error code is returned when the coder cannot consume
		 * any new input and produce any new output. The most common
		 * reason for this error is that the input stream being
		 * decoded is truncated or corrupt.
		 *
		 * This error is not fatal. Coding can be continued normally
		 * by providing more input and/or more output space, if
		 * possible.
		 *
		 * Typically the first call to lzma_code() that can do no
		 * progress returns LZMA_OK instead of LZMA_BUF_ERROR. Only
		 * the second consecutive call doing no progress will return
		 * LZMA_BUF_ERROR. This is intentional.
		 *
		 * With zlib, Z_BUF_ERROR may be returned even if the
		 * application is doing nothing wrong, so apps will need
		 * to handle Z_BUF_ERROR specially. The above hack
		 * guarantees that liblzma never returns LZMA_BUF_ERROR
		 * to properly written applications unless the input file
		 * is truncated or corrupt. This should simplify the
		 * applications a little.
		 */
```

It shouldn't be treated as error since it is produced when liblzma
cannot generate any output yet.
@Quintus
Copy link
Owner

Quintus commented Jul 26, 2018

I'm not entirely sure if completely ignoring the error is a good idea. It could silently truncate the decompressed data. If liblzma can't continue, then liblzma is probably not outputting any data in lzma_code(), and thus causing XZ::lzma_code to terminate with no appearent error.

The docs you quoted explicitely say that one shouldn't usually come over this error unless the input data is corrupt. If the input data is corrupt, the correct way is to signal an exception to prevent silent data corruption.

Maybe it's thus better to instead leave the decision on what to do with LZMA_BUF_ERROR to the user in form of an additional argument or global option on the XZ module? It would default to raising, but if the user knows what he's doing, he could choose to ignore it.

@Quintus
Copy link
Owner

Quintus commented Jul 26, 2018

Mh, little mistake on my side. I was talking about this line, but it of course only terminates the inner loop. If lzma_code() doesn't fill the output buffer, XZ::lzma_code is going to feed it the next input, if any. So there wouldn't be truncation, but it still feels wrong to silently eat the error code. I'd still say there should be a way for the user to configure what to do.

@genail
Copy link
Contributor Author

genail commented Aug 1, 2018

Thank you for looking into my PR.

Yeah, it kinda felt wrong, but I'm not sure if I understand that encoded stream is "truncated". I'm decoding large portions of data (tens of gigabytes) and it happens to me quite often. Yes, my stream is truncated, but this is how streams are working, duh! Maybe I am doing something wrong in the first place?

What I'm doing exactly is processing a stream through AES decipher, lzma library, and writing the result in a file. The buffer size passed to lzma decoder is rather random (its size depends of AES decipher output). Is there a rule that I'm missing maybe, like passing 4kb chunks or something like that?

If you don't know any, then I'll think of a configuration option.

@Quintus
Copy link
Owner

Quintus commented Aug 26, 2018

I'm sorry for being so terribly slow.

Is there a rule that I'm missing maybe, like passing 4kb chunks or something like that?

There’s none. liblzma only requires to tell how large your input and output buffers are.

If you don't know any, then I'll think of a configuration option.

That'd be nice. If you find a good way, feel free to file it as a PR again. Otherwise I'm going to look into the topic myself.

@paulvt
Copy link

paulvt commented Oct 2, 2019

This issue can even be triggered when streaming a tar to the stream writer as well! Sometimes either empty data can passed twice, or a block of NULL bytes can be passed passed (part of the tail of the tar file), both leading to the LZMA_BUF_ERROR unnecessarily leading to an exception raise.

Attached is a tar file that cannot be compressed when passing it in chunks (simulating the tarring process). For this you need to gunzip the attached tarball first.

XZ::StreamWriter.open('/path/to/tmp/test2.tar.xz', external_encoding: 'binary') do |txz|
  File.open("/path/to/tmp/test2.tar", "rb") do |file|
    while chunk = file.read(4096)
      txz.write(chunk)
    end
  end
end
# => XZ::LZMAError: Buffer unusable!

Also, when using the basic example from the homepage that does streaming tar packing and compressing, the data cannot be compressed. For this you need to extract the tarball to some location first.

XZ::StreamWriter.open('/path/to/tmp/test2.tar.xz', external_encoding: 'binary') do |txz|
  Minitar.pack('/path/to/tmp/test2', txz)
end

Note that this issue was also fixed in the Python LZMA module for the same reason:
https://bugs.python.org/issue27517

Attached: test2.tar.gz

paulvt added a commit to LeftClickBV/ruby-xz that referenced this pull request Oct 2, 2019
@Quintus
Copy link
Owner

Quintus commented Nov 7, 2021

I have decided to abaondon this project due to lack of time to maintain it, and I don’t use it anymore. Hence, I close this issue now. Please fork if you want to continue maintaining it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants