Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

open stream can raise a FzErrorFormat error instead of FileDataError #3905

Closed
cbm755 opened this issue Sep 28, 2024 · 5 comments
Closed

open stream can raise a FzErrorFormat error instead of FileDataError #3905

cbm755 opened this issue Sep 28, 2024 · 5 comments
Labels
fix developed release schedule to be determined Fixed in next release

Comments

@cbm755
Copy link
Contributor

cbm755 commented Sep 28, 2024

Description of the bug

If I feed a .csv file to pymupdf.open, I get an FileDataError, as documented:

If you attempt to open an unsupported file then PyMuPDF will throw a file data error.

But if I instead pass the bytes of the same file to stream= I get an FzErrorFormat, which I was not expecting from the docs.

How to reproduce the bug

with open('myfile.csv', 'rb') as f:
    file_bytes = f.read()

It probably doesn't matter what's in csv but here's mine:

>> file_bytes
b'A,B,C,D\r\n1,2,1,2\r\n2,2,1,2\r\n'

Now we try to open this:

>> pymupdf.open(stream=file_bytes)
---------------------------------------------------------------------------
FzErrorFormat                             Traceback (most recent call last)
<ipython-input-21-668e9798a921> in ?()
----> 1 pymupdf.open(stream=file_bytes)

~/.local/lib/python3.12/site-packages/pymupdf/__init__.py in ?(self, filename, stream, filetype, rect, width, height, fontsize)
   2884                     self.page_count2 = extra.page_count_pdf
   2885                 else:
   2886                     self.page_count2 = extra.page_count_fz
   2887         finally:
-> 2888             JM_mupdf_show_errors = JM_mupdf_show_errors_old

~/.local/lib/python3.12/site-packages/pymupdf/mupdf.py in ?(magic, stream)
  44292 
  44293         NOTE: The caller retains ownership of 'stream' - the document will take its
  44294         own reference if required.
  44295     """
> 44296     return _mupdf.fz_open_document_with_stream(magic, stream)

FzErrorFormat: code=7: no objects found

Contrast this with what happens when I open the file directly:

pymupdf.open("myfile.csv")
---------------------------------------------------------------------------
FzErrorUnsupported                        Traceback (most recent call last)
~/.local/lib/python3.12/site-packages/pymupdf/__init__.py in ?(self, filename, stream, filetype, rect, width, height, fontsize)
   2886                     self.page_count2 = extra.page_count_fz
   2887         finally:
-> 2888             JM_mupdf_show_errors = JM_mupdf_show_errors_old

~/.local/lib/python3.12/site-packages/pymupdf/mupdf.py in ?(filename)
  44271         filename: a path to a file as it would be given to open(2).
  44272     """
> 44273     return _mupdf.fz_open_document(filename)

FzErrorUnsupported: code=6: cannot find document handler for file: myfile.csv

The above exception was the direct cause of the following exception:

FileDataError                             Traceback (most recent call last)
<ipython-input-22-b19d9e4e2772> in ?()
----> 1 pymupdf.open("myfile.csv")

~/.local/lib/python3.12/site-packages/pymupdf/__init__.py in ?(self, filename, stream, filetype, rect, width, height, fontsize)
   2884                     self.page_count2 = extra.page_count_pdf
   2885                 else:
   2886                     self.page_count2 = extra.page_count_fz
   2887         finally:
-> 2888             JM_mupdf_show_errors = JM_mupdf_show_errors_old

FileDataError: Failed to open file 'myfile.csv'.

(we can see it still fails with FzErrorUnSupported but this ultimately raises FileDataError as documented).

PyMuPDF version

1.24.10

Operating system

Linux

Python version

3.12

@julian-smith-artifex-com
Copy link
Collaborator

julian-smith-artifex-com commented Sep 28, 2024

The difference here is that in your file case the filename suffix .csv is available so PyMuPDF tries to open the file as a comma-separated file.

Where as your stream case just specifies the data with no filetype information, so PyMuPDF tries to open the stream as a PDF by default.

Instead you can specify the filetype explicitly with pymupdf.open(stream=file_bytes, filetype='csv').

@cbm755
Copy link
Contributor Author

cbm755 commented Sep 28, 2024

Right, but my concern here is about the exception I get. From the docs, I expected FileDataError.

I could certainly change my code to also check for pymupdf.mupdf.FzErrorBase (and try to send you a docs patch) but that seems a bit "private" to me (?)

For example, I have changed my code as follows:

         try:
             ...
         except (pymupdf.FileDataError, KeyError) as e:
            raise ValidationError(f"Unable to open file: {e}") from e
+        except pymupdf.mupdf.FzErrorBase as e:
+            # https://github.com/pymupdf/PyMuPDF/issues/3905
+            raise ValidationError(
+                f"Perhaps not a pdf file?  Unexpected error: {e}"
+            ) from e

@cbm755 cbm755 changed the title open stream can raise a FzErrorFormat error: is that intended? open stream can raise a FzErrorFormat error instead of FileDataError Sep 28, 2024
@cbm755
Copy link
Contributor Author

cbm755 commented Sep 28, 2024

(I have edited to issue title to try to clarify I'm asking about what sort of Exception I should get here)

@julian-smith-artifex-com
Copy link
Collaborator

Ah, i see what you mean. The problem is that we don't wrap the internal call to fz_open_document_with_stream() like we do fz_open_document(), so the underlying MuPDF exception leaks out.

I have a fix in my tree.

@julian-smith-artifex-com julian-smith-artifex-com added the fix developed release schedule to be determined label Sep 30, 2024
julian-smith-artifex-com added a commit to ArtifexSoftware/PyMuPDF-julian that referenced this issue Sep 30, 2024
…ocument_with_stream() fails.

We wrap fz_open_document_with_stream(), converting exception to FileDataError;
this makes things behave similarly to when we call fz_open_document().

Addresses pymupdf#3905.
julian-smith-artifex-com added a commit that referenced this issue Sep 30, 2024
…ocument_with_stream() fails.

We wrap fz_open_document_with_stream(), converting exception to FileDataError;
this makes things behave similarly to when we call fz_open_document().

Addresses #3905.
@julian-smith-artifex-com
Copy link
Collaborator

Fixed in 1.24.11.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix developed release schedule to be determined Fixed in next release
Projects
None yet
Development

No branches or pull requests

2 participants