-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Infinite recursion when using PdfWriter(clone_from=reader) #2264
Conversation
Use a visited memo to check if the current object in the clone operation has already been visited, and if so, do not add it to the list of objects. This avoids infinite recursion in case there are links to identical objects inside a PDF.
I didn't apply the mypy lint to filters.py but would be more than happy to do so. My only major question would be then do I need to check if the PDFs are identical for the |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #2264 +/- ##
==========================================
- Coverage 94.44% 94.43% -0.01%
==========================================
Files 43 43
Lines 7634 7643 +9
Branches 1506 1508 +2
==========================================
+ Hits 7210 7218 +8
Misses 262 262
- Partials 162 163 +1
☔ View full report in Codecov by Sentry. |
@Alexhuszagh |
I just got approval to send the clean version over from the client and just sent it over. |
With the PDF you provided I can confirm that there is an infinite loop causing 100% CPU utilization for the executing core. Hence I added the
nf-security
Meta-information about the problematic fileIt's not completely broken:
I can also just open it fine in chrome / evince. |
Thank you for your contribution 🙏 I will make a release in a few minutes that contains your fix. Additionally, I will also add a security advisory to update. What a great first PR! If you want, I'll add you to https://pypdf.readthedocs.io/en/latest/meta/CONTRIBUTORS.html :-) |
## What's new ### Security (SEC) - Infinite recursion when using PdfWriter(clone_from=reader) (#2264) by @Alexhuszagh ### New Features (ENH) - Add parameter to select images to be removed (#2214) by @pubpub-zz ### Bug Fixes (BUG) - Correctly handle image mode 1 with FlateDecode (#2249) by @stefan6419846 - Error when filling a value with parentheses #2268 (#2269) by @KanorUbu - Handle empty root outline (#2239) by @pubpub-zz ### Documentation (DOC) - Improve merging docs (#2247) by @stefan6419846 ### Developer Experience (DEV) - Test Python 3.7 with cryptopgraphy provider as well (#2276) by @stefan6419846 - Run CI with windows-latest (#2258) by @MartinThoma - Use pytest-xdist (#2254) by @MartinThoma - Attribute correct authors in the release notes (#2246) by @stefan6419846 ### Maintenance (MAINT) - Apply pre-commit hooks (#2277) by @MartinThoma - Update requirements + mypy fixes (#2275) by @MartinThoma - Explicitly provide Any for IO generic argument (#2272) by @nilehmann ### Testing (TST) - Fix test_image_without_pillow in windows environment (#2257) by @pubpub-zz ### Code Style (STY) - Remove unused import by @MartinThoma [Full Changelog](3.16.4...3.17.0)
That would be wonderful, thank you. |
Use a visited memo to check if the current object in the clone operation has already been visited, and if so, do not add it to the list of objects. This avoids infinite recursion in case there are references to identical objects inside a PDF.
Unfortunately, since the example PDF contains financial data from a client I'm not free to provide the failing example that caused this PR. If you have a suggestion on how to remove that data while keeping the references identical, I'd be more than happy to contribute that. I hope the implementation is logical enough so this isn't needed.
The code that fails is as simple as: