Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: context manager for PdfReader #2666

Merged
merged 7 commits into from
May 26, 2024

Conversation

tibor-reiss
Copy link
Contributor

Closes #2665

Copy link

codecov bot commented May 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.03%. Comparing base (08731fa) to head (cc3d491).
Report is 67 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2666      +/-   ##
==========================================
+ Coverage   95.01%   95.03%   +0.01%     
==========================================
  Files          50       50              
  Lines        8352     8372      +20     
  Branches     1673     1674       +1     
==========================================
+ Hits         7936     7956      +20     
  Misses        258      258              
  Partials      158      158              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pubpub-zz
Copy link
Collaborator

Can you please add a test for coverage

@tibor-reiss tibor-reiss force-pushed the enh-context-manager-for-pdfreader branch from c626205 to cc8af1b Compare May 22, 2024 10:27
@tibor-reiss tibor-reiss force-pushed the enh-context-manager-for-pdfreader branch from cc8af1b to 84e0a4d Compare May 22, 2024 10:30
@stefan6419846
Copy link
Collaborator

I am not completely sure: Which effect should be the goal of this change? The reading still happens in the __init__ method and there a no resources which would require closing on exit where a context manager would make sense?

@tibor-reiss
Copy link
Contributor Author

tibor-reiss commented May 22, 2024

@stefan6419846 e.g. self.stream is not closed which could result in a memory leak. Additionally, in the 2nd commit, I added some other attributes which have similar issue.

You are right though that __init__ contains the read method - should this be moved to __enter__? However, that might break some existing code.

@MasterOdin
Copy link
Member

MasterOdin commented May 23, 2024

I think that self.stream should only be closed as part of __exit__ if it was created as part of __init__. If a user passes in a stream to PdfReader, then closing it may not be desired.

You are right though that __init__ contains the read method - should this be moved to enter? However, that might break some existing code.

Unless the desire was to mandate that people must use a context manager, having __enter__ only return self and do nothing else (and leave all the work to __init__) is super common. You would also potentially want to have a close method which __exit__ calls so that users who don't use a context manager can also close out their reader and avoid a memory leak.

@tibor-reiss
Copy link
Contributor Author

close() was also implemented in _merger.py - thanks for the hint @MasterOdin.

@pubpub-zz pubpub-zz merged commit b9920fa into py-pdf:main May 26, 2024
16 checks passed
@tibor-reiss tibor-reiss deleted the enh-context-manager-for-pdfreader branch May 26, 2024 11:17
@MartinThoma MartinThoma changed the title ENH: context manager for pdfreader ENH: context manager for PdfReader Jun 23, 2024
stefan6419846 added a commit that referenced this pull request Jun 23, 2024
## What's new

### New Features (ENH)
- Accept ETen-B5 and UniCNS-UTF16 encodings (#2721) by @pubpub-zz
- Add decode_as_image() to ContentStreams (#2615) by @pubpub-zz
- context manager for PdfReader (#2666) by @tibor-reiss
- Add capability to set font and size in fields (#2636) by @pubpub-zz
- Allow to pass input file without named argument (#2576) by @pubpub-zz

### Bug Fixes (BUG)
- Fix deprecation for Ressources when using old constants (#2705) by @stefan6419846
- Fix images issue 4 bits encoding and LUT starting with UTF16_BOM (#2675) by @pubpub-zz
- Reading large compressed images takes huge time to process (#2644) by @snanda85
- Highlighted Text Cannot Be Printed (#2604) by @Nifury
- Fix UnboundLocalError on malformed pdf (#2619) by @farjasju

### Documentation (DOC)
- Various improvements on docstrings and examples by @j-t-1

### Robustness (ROB)
- Cope with missing Standard 14 fonts in fields (#2677) by @pubpub-zz
- Improve inline image extraction (#2622) by @pubpub-zz
- Cope with loops in Fields tree (#2656) by @pubpub-zz
- Discard /I in choice fields for compatibility with Acrobat (#2614) by @pubpub-zz
- Cope with some issues in pillow (#2595) by @pubpub-zz
- Cope with some image extraction issues (#2591) by @pubpub-zz

### Maintenance (MAINT)
- Deprecate interiour_color with replacement interior_color (#2706) by @j-t-1
- Add deprecate_with_replacement to PdfWriter.find_bookmark (#2674) by @j-t-1

### Code Style (STY)
- Change Link to be a non-markup annotation (#2714) by @j-t-1

[Full Changelog](4.2.0...4.3.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement '__enter__' in PdfReader
4 participants