❗️ v0.3.11 is broken #9

lsorber · 2024-10-08T12:35:31Z

The new v0.3.11 release seems to be broken. Minimal reproducible example (e.g., in a Google Colab):

%pip install --quiet pdftext==0.3.11

from pathlib import Path
from pdftext.extraction import dictionary_output

pages = dictionary_output(Path("specrel.pdf"), sort=True, keep_chars=False)

Output:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[<ipython-input-2-783804a94b7a>](https://localhost:8080/#) in <cell line: 4>()
      2 from pdftext.extraction import dictionary_output
      3 
----> 4 pages = dictionary_output(Path("specrel.pdf"), sort=True, keep_chars=False)

2 frames
[/usr/local/lib/python3.10/dist-packages/pdftext/extraction.py](https://localhost:8080/#) in _load_pdf(pdf, flatten_pdf)
     17 
     18     if not isinstance(pdf, pdfium.PdfDocument):
---> 19         raise TypeError("pdf must be a file path string or a PdfDocument object")
     20 
     21     # Must be called on the parent pdf, before the page was retrieved

TypeError: pdf must be a file path string or a PdfDocument object

The text was updated successfully, but these errors were encountered:

VikParuchuri · 2024-10-08T13:14:20Z

Fix is out (v0.3.12), thanks for the catch

lsorber mentioned this issue Oct 8, 2024

fix: avoid pdftext v0.3.11 superlinear-ai/raglite#27

Merged

VikParuchuri closed this as completed Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

❗️ v0.3.11 is broken #9

❗️ v0.3.11 is broken #9

lsorber commented Oct 8, 2024

VikParuchuri commented Oct 8, 2024

❗️ v0.3.11 is broken #9

❗️ v0.3.11 is broken #9

Comments

lsorber commented Oct 8, 2024

VikParuchuri commented Oct 8, 2024