Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRC check failed when reading after seeking #73

Closed
mxmlnkn opened this issue Sep 13, 2021 · 2 comments
Closed

CRC check failed when reading after seeking #73

mxmlnkn opened this issue Sep 13, 2021 · 2 comments

Comments

@mxmlnkn
Copy link

mxmlnkn commented Sep 13, 2021

System Information:

  • Ubuntu 20.10
  • Python 3.8.10
  • rarfile 4.0
  • unar v1.10.1
  • UNRAR 5.61 beta 1

Both, unar and unrar are in my PATH, so I don't know which is used. I think I don't have bsdtar installed.

Steps to reproduce:

  1. Create test rar: echo foo > bar && rar a bar.rar bar
  2. Open with rarfile and seek and read:
import rarfile
rar = rarfile.RarFile("bar.rar")
file = rar.open("bar")
# These read calls were only to show that rarfile generally works but it seems they are somewhat important for reproduction!
file.read(1)  # b'f'
file.read(1)  # b'o'
file.read(1)  # b'o'
file.read(1)  # b'\n'
file.read(1)  # b''
# Seeking to 0 is no problem. Again, these calls can be omitted for reproducing the problem
file.seek(0)
file.read()   # b'foo\n'
# Here begins the problematic sequence
file.seek(1)  # 1
file.read()
---------------------------------------------------------------------------
BadRarFile                                Traceback (most recent call last)
<ipython-input-42-f3fc120c03c1> in <module>
----> 1 file.read()

~/.local/lib/python3.8/site-packages/rarfile.py in read(self, n)
   2200         if not data or self._remain == 0:
   2201             # self.close()
-> 2202             self._check()
   2203         return data
   2204 

~/.local/lib/python3.8/site-packages/rarfile.py in _check(self)
   2216             raise BadRarFile("Failed the read enough data")
   2217         if final != exp:
-> 2218             raise BadRarFile("Corrupt file - CRC check failed: %s - exp=%r got=%r" % (
   2219                 self._inf.filename, exp, final))
   2220 

BadRarFile: Corrupt file - CRC check failed: bar - exp=2117232040 got=3195718521

Forward seeking does not seem to be a problem. This works:

rar = rarfile.RarFile("bar.rar")
file = rar.open("bar")
file.seek(1)
file.read()

However, as soon as I am seeking backwards, the problem arises even when using crc_check=False, which makes it even weirder!

rar = rarfile.RarFile("bar.rar", crc_check=False)
file = rar.open("bar")
file.read(2)
file.seek(1)
file.read()  # exception!
@mxmlnkn
Copy link
Author

mxmlnkn commented Sep 25, 2021

I took a quick look at the source and documentation and it seems that backward seeking is supposed to be implemented by reopening the file. Somehow that reopen isn't effective enough. My workaround, which also simply reopens the file, works without problems:

class RawFileInsideRar(io.RawIOBase):
    def __init__(self, reopen, file_size):
        self.reopen = reopen
        self.fileobj = reopen()
        self.file_size = file_size

    def __enter__(self):
        return self

    def __exit__(self, exception_type, exception_value, exception_traceback):
        self.close()

    def close(self) -> None:
        self.fileobj.close()

    def fileno(self) -> int:
        # This is a virtual Python level file object and therefore does not have a valid OS file descriptor!
        raise io.UnsupportedOperation()

    def seekable(self) -> bool:
        return self.fileobj.seekable()

    def readable(self) -> bool:
        return self.fileobj.readable()

    def writable(self) -> bool:
        return False

    def read(self, size: int = -1) -> bytes:
        return self.fileobj.read(size)

    def seek(self, offset: int, whence: int = io.SEEK_SET) -> int:
        if whence == io.SEEK_CUR:
            offset += self.tell()
        elif whence == io.SEEK_END:
            offset += self.file_size

        if offset >= self.tell():
            return self.fileobj.seek(offset, io.SEEK_SET)

        self.fileobj = self.reopen()
        return self.fileobj.seek(offset, io.SEEK_SET)

    def tell(self) -> int:
        return self.fileobj.tell()

Replacing the rar.open("bar") in my minimal non-working examples the following two lines will make them run just fine:

info = rar.getinfo("bar")
file = RawFileInsideRar(lambda: rar.open(info), info.file_size)

@markokr
Copy link
Owner

markokr commented Sep 17, 2023

Thanks for the report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants