Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle utf-16-encoded annotations (#463) #519

Merged
merged 1 commit into from
Oct 20, 2021
Merged

Handle utf-16-encoded annotations (#463) #519

merged 1 commit into from
Oct 20, 2021

Conversation

jsvine
Copy link
Owner

@jsvine jsvine commented Oct 20, 2021

Thanks to @tungph for the fix proposal.

Thanks to @tungph for the fix proposal.
@jsvine jsvine requested a review from samkit-jain October 20, 2021 00:27
@codecov
Copy link

codecov bot commented Oct 20, 2021

Codecov Report

Merging #519 (c3be777) into develop (4b61c38) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head c3be777 differs from pull request most recent head df98f9c. Consider uploading reports for the commit df98f9c to get more accurate results
Impacted file tree graph

@@           Coverage Diff            @@
##           develop     #519   +/-   ##
========================================
  Coverage    98.30%   98.30%           
========================================
  Files           10       10           
  Lines         1236     1239    +3     
========================================
+ Hits          1215     1218    +3     
  Misses          21       21           
Impacted Files Coverage Δ
pdfplumber/page.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4b61c38...df98f9c. Read the comment docs.

Copy link
Collaborator

@samkit-jain samkit-jain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving the PR but this feels more like a hack. There could be a possibility in the future that we might have to add more encodings as defined here. A more reliable solution would be a logic that automatically identifies the codec to be used (which I don't think is even possible to do with 100% accuracy). Or perhaps, adding a parameter for the users to pass in the encoding.

@jsvine
Copy link
Owner Author

jsvine commented Oct 20, 2021

Thanks, and I agree: Not the world's most robust fix, but workable for now. I think the parameterizing is an intriguing idea, though I wonder: Are PDFs guaranteed to maintain encoding consistency throughout their annotations (or other content)? I suppose even if they do not, a user with a particularly tricky PDF could then try whatever combination of encodings suited them.

@jsvine jsvine merged commit 0c30a53 into develop Oct 20, 2021
@jsvine jsvine deleted the issue/463 branch October 20, 2021 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants