Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore null characters in PSBaseParser #768

Merged
merged 3 commits into from
Jun 26, 2022
Merged

Ignore null characters in PSBaseParser #768

merged 3 commits into from
Jun 26, 2022

Conversation

cchristiansen
Copy link

@cchristiansen cchristiansen commented Jun 9, 2022

Pull request

Beforehand, null characters parsed by PSBaseParser were encoded as PSKeyword tokens. This caused issue #617, as pdfdevice.py would attempt to decode the null character PSKeyword, when it expects a byte string, as opposed to a PSKeyword, causing pdfminer.six to crash.

As null characters are superfluous within PSBaseParser, ignore them.

How Has This Been Tested?

A PDF included in issue #617 recreates the bug. With this commit, the bug is fixed.

Checklist

  • I have formatted my code with black.
  • I have added tests that prove my fix is effective or that my feature
    works
  • I have added docstrings to newly created methods and classes
  • I have optimized the code at least one time after creating the initial
    version
  • I have updated the README.md or verified that this
    is not necessary
  • I have updated the readthedocs documentation or
    verified that this is not necessary
  • I have added a concise human-readable description of the change to
    CHANGELOG.md

Beforehand, null characters were encoded as PSKeyword tokens. This caused
issue #617, as pdfdevice.py would attempt to decode the null character
PSKeyword, when it expects a byte string, as opposed to a PSKeyword, causing
pdfminer.six to crash.

As null characters are superfluous within PSBaseParser, ignore them.
@cchristiansen cchristiansen marked this pull request as draft June 9, 2022 06:05
@cchristiansen cchristiansen marked this pull request as ready for review June 9, 2022 06:05
@pietermarsman
Copy link
Member

Thanks for figuring this out, proposing the solution and creating the PR! 👏

@pietermarsman pietermarsman merged commit ebf92ac into pdfminer:master Jun 26, 2022
@cchristiansen
Copy link
Author

And thank you for your work as a maintainer and keeping PDFMiner.six up-to-date!

Beants added a commit to HiTalentAlgorithms/pdfminer.six that referenced this pull request Aug 5, 2022
* commit '8f52578e85b27831ab8a68a6d86721ea3348a553':
  Run black locally with nox (pdfminer#776)
  Install typing_extensions on Python 3.6 and 3.7 (pdfminer#775)
  Fix `TypeError` by Ignoring null characters in PSBaseParser (pdfminer#768)
  Fix `ValueError` with unencrypted metadata values (Fixes pdfminer#766). (pdfminer#774)
  Fix `TypeError` when getting default width of font (pdfminer#772)
  Deprecate usage of `if __name__ == "__main__"` in scripts that are not documented. Also deprecate usage of scripts that are only there for testing purposes. (pdfminer#756)
  Fix Sphinx warnings and error (pdfminer#760)
  Update CHANGELOG.md for pdfminer#755
  Remove upper version bounds (pdfminer#755)
  Ignore path constructors that do not begin with  m (pdfminer#749)
  Bump version 20220506 & fix small issue with types
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants