Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to new pypdfium2 API #845

Merged
merged 4 commits into from
Mar 15, 2022
Merged

Switch to new pypdfium2 API #845

merged 4 commits into from
Mar 15, 2022

Conversation

mara004
Copy link
Contributor

@mara004 mara004 commented Mar 8, 2022

Modifications to use the new pypdfium2 API (see #844).
I did not dare to try render_pdf_tobytes(), as I think that might change the colour format from RGB to BGR.

@fg-mindee fg-mindee self-assigned this Mar 9, 2022
@fg-mindee fg-mindee self-requested a review March 9, 2022 10:56
@fg-mindee fg-mindee added the ext: docs Related to docs folder label Mar 9, 2022
Copy link
Contributor

@fg-mindee fg-mindee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR!

It's much appreciated that upstream projects help with the transition from one version of their project to the next 🙏 I added a few comments, let me know what you think!

doctr/io/pdf.py Show resolved Hide resolved
doctr/io/pdf.py Outdated Show resolved Hide resolved
@mara004
Copy link
Contributor Author

mara004 commented Mar 14, 2022

I have now released pypdfium2 1.0.0 and applied the review suggestions. Could you please re-run CI?

@mara004 mara004 marked this pull request as ready for review March 14, 2022 16:20
I need to do something to make these paths shorter.
@fg-mindee
Copy link
Contributor

I have now released pypdfium2 1.0.0 and applied the review suggestions. Could you please re-run CI?

Thanks for the ping 🙏

I'm all for loose version specifiers as long as they don't put too much constraints to avoid pushing to many env constraints on the user. So here are a few questions:

  • are there dependency changes with this new release?
  • apart from the import/call of the function to render the page, does the behaviour change in any noticeable way in release 1.0.0?

If it's only a change of import location, perhaps we could have a dynamic import depending on the version of pypfdium2. But if there are no modifications on version specifiers with this new release, then it's not worth going through this trouble and we can move forward with the PR as is :)

@fg-mindee fg-mindee added module: io Related to doctr.io topic: build Related to dependencies and build type: misc Miscellaneous labels Mar 14, 2022
@codecov
Copy link

codecov bot commented Mar 14, 2022

Codecov Report

Merging #845 (d3f6d9b) into main (bc03a0f) will decrease coverage by 0.01%.
The diff coverage is 83.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #845      +/-   ##
==========================================
- Coverage   94.84%   94.82%   -0.02%     
==========================================
  Files         133      133              
  Lines        5197     5200       +3     
==========================================
+ Hits         4929     4931       +2     
- Misses        268      269       +1     
Flag Coverage Δ
unittests 94.82% <83.33%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
doctr/io/pdf.py 93.33% <83.33%> (-6.67%) ⬇️
doctr/transforms/modules/base.py 94.59% <0.00%> (ø)
doctr/transforms/functional/base.py 95.65% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bc03a0f...d3f6d9b. Read the comment docs.

@mara004
Copy link
Contributor Author

mara004 commented Mar 14, 2022

are there dependency changes with this new release?

The dependency on PIL/Pillow has been made optional. This means the core library now does not have external runtime dependencies anymore. Otherwise, there are no dependency changes.

If it's only a change of import location, perhaps we could have a dynamic import depending on the version of pypfdium2.

It essentially is a renaming from render_pdf() to render_pdf_topil(). Theoretically, I believe it would be possible to use hasattr() or check the version specifier to make the code work with older versions of pypdfium2. Probably somewhat like this:

import pypdfium2 as pdfium

if pdfium._version.V_MAJOR == 0:
    render_pdf_func = pdfium.render_pdf
else:
    render_pdf_func = pdfium.render_pdf_topil

# ...
return [np.asarray(img) for img, _ in render_pdf_func(file, scale=scale, **kwargs)]

However, I don't believe it's worth the additional code, considering that pypdfium2 < 1 has not been widely used.

apart from the import/call of the function to render the page, does the behaviour change in any noticeable way in release 1.0.0?

Other parts of the public API have not changed much; it's mainly improvements under the hood. You can refer to the Changelog for more details.

Copy link
Contributor

@fg-mindee fg-mindee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR 🙏 Indeed with the small dependency specifier modifications, it's not worth going through the trouble of a dynamic import!

@fg-mindee fg-mindee added this to the 0.5.1 milestone Mar 15, 2022
@fg-mindee fg-mindee merged commit 096eb48 into mindee:main Mar 15, 2022
@mara004
Copy link
Contributor Author

mara004 commented Mar 15, 2022

Thanks!

@mara004 mara004 deleted the pypdfium branch March 15, 2022 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ext: docs Related to docs folder module: io Related to doctr.io topic: build Related to dependencies and build type: misc Miscellaneous
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants