ENH: Add decrypt support for V5 and AES-128, AES-256 (R5 only) #749

exiledkingcc · 2022-04-14T16:23:45Z

rewrite the encryption part to support V4 and AES-128 encryption (ONLY decrypt for now).

i like the idea of PyPDF2 cleanup, so this is Python3 ONLY.

this commit needs PyCryptodome for AES operations.

the encrypt part will be added some time later, and maybe AES-256 support will be added too, if it's not difficult.

exiledkingcc · 2022-04-15T03:00:04Z

local tox test is OK with py38.
it seems github failed because dependency PyCryptodome not installed.
does it not read tox.ini?

MartinThoma · 2022-04-15T10:26:01Z

does it not read tox.ini?

CI (Github Actions) is defined here. It uses requirements/ci.txt - and manually entered packages for Python 2.7

MartinThoma · 2022-04-15T12:45:44Z

As this PR is rather big, I would add it in the 2.0 release or later. Simply to ensure we can leave the 1.X parts soon-ish. Then you also don't have to care about the 2.7 support as it will be dropped with PyPDF2 version 2.0

NOTICE: 1. Python3 only 2. need PyCryptodome for AES

exiledkingcc · 2022-04-18T06:38:47Z

encryption R=6 is used by PDF 2.0.
i can't find a "PDF 2.0 specification" on web, so it's not supported

MasterOdin · 2022-04-18T15:59:19Z

PyPDF2/encryption.py

+
+
+try:
+    from Crypto.Cipher import ARC4, AES


What's the motivation to have it include both attempt to use https://github.com/Legrandin/pycryptodome and also a fallback to an embedded method? Feels like should either add dependencies or have decryption be included, but having both seems like added maintenance for little gain.

at first, I used PyCryptodome, then I found PyPDF2 has no dependencies at all, so I try not to introduce a dependency by adding AES code in PyPDF2.

encryption/decryption is not always needed for PDF, it's better to make PyCryptodome optional for users who do not need it.
for some other users, they may prefer PyCryptodome for better performance.
so I provide both.

maybe there should be a discussion for this?

For PDF encryption / decryption in the wild, it's really just using AES out in the wild, right? So it's not like if to have in-house cryptography around this, the code for that wouldn't be too large, and should be relatively stable?

PDF specification also defined Public Key algorithms for PDF encryption, but I never see such PDF files.
most encrypted PDF files use only RC4 and AES, and some hash functions.
AES code can be very stable.

Thinking about it, I really would like to avoid having to maintain security-critical parts. I worry more about the encryption part than the decryption part.

If we mess up encryption, users might end up less secure then they expect. If we mess up decryption, users will just see an error.

Maybe we can have "inline imports" (importing Crypto within the encrypt / decrypt function). So we could make PyCryptodome an optional dependency that people would need to install if they want to use encryption / decryption. Maybe we could also get rid of a part of the current codebase this way?

To clarify: I don't see that I will properly maintain the crypto parts. This is the reason why I have this tendency. What do you think?

inline imports is a good choice, actually it's what I did at first.
when I added AES code, I try to make the lib independent, but ignored the burden of maintaining.
I agree with that it's better to leave the AES to PyCryptodome when users need it.
I will remove _aes.py later.

MartinThoma · 2022-04-21T06:33:27Z

Very nice, thank you 🤗

MartinThoma · 2022-04-21T06:35:51Z

One adjustment is still necessary: In setup.cfg needs to be a section:

[options.extras_require]
crypto = PyCryptodome

So people can install it with pip install PyPDF2[crypto]

Is there a minimum version of PyCryptodome this PR needs?

exiledkingcc · 2022-04-21T14:11:23Z

One adjustment is still necessary: In setup.cfg needs to be a section:
[options.extras_require]
crypto = PyCryptodome
So people can install it with pip install PyPDF2[crypto]

good idea!

Is there a minimum version of PyCryptodome this PR needs?

no

MartinThoma · 2022-04-21T15:16:33Z

Nice! From my perspective this looks ready to be merged. However, due to the missing 2.7 support, it will take a while (until the PyPDF2 2.0 release).

I hope I can make that release on 1st of July. I really want to get this done: #752

If I don't get any comments on what should be part of PyPDF2 1.x?, I will start working on 2.0 on 1st of May.

MartinThoma · 2022-04-25T05:48:39Z

I think I'll add a "no-python27" pytest marker that CI uses

... on the other hand, if nobody reacts to #753 I will start working on the PyPDF2 2.0 release from 1st of May 😄

MartinThoma · 2022-04-25T05:52:22Z

I still want to go through #817 and merge

Plus maybe add a contributors.md and maybe add some magic methods / camel_case method names + deprecation warnings for snakeCase method names: #751

exiledkingcc · 2022-06-15T06:07:11Z

it seems OK for me, no change is needed.

MartinThoma · 2022-06-19T05:46:59Z

PyPDF2/_reader.py

+            except DependencyError as e:
+                # make dependency error clear to users
+                raise e


Is that part necessary? What does it do? It seems to me that you just catch an exception and raise exactly the same exception again.... hence removing this block would not change the behavior?

if your PDF need AES algorithm to decrypt, but you don't install pycrytodome, without this two line,you will just got "the pdf is not decrypted", with this, you will know that you just need install the dependcy.

Oh, interesting! Took me a minute to understand - thank you for pointing out that I need to look at the following two lines!

PyPDF2/encryption.py

Related to #749

@exiledkingcc

The highlight of this release is improved support for file encryption (AES-128 and AES-256, R5 only). See #749 for the amazing work of @exiledkingcc 🎊 Thank you 🤗 Deprecations (DEP): - Rename names to be PEP8-compliant (#967) - `PdfWriter.get_page`: the pageNumber parameter is renamed to page_number - `PyPDF2.filters`: * For all classes, a parameter rename: decodeParms ➔ decode_parms * decodeStreamData ➔ decode_stream_data - `PyPDF2.xmp`: * XmpInformation.rdfRoot ➔ XmpInformation.rdf_root * XmpInformation.xmp_createDate ➔ XmpInformation.xmp_create_date * XmpInformation.xmp_creatorTool ➔ XmpInformation.xmp_creator_tool * XmpInformation.xmp_metadataDate ➔ XmpInformation.xmp_metadata_date * XmpInformation.xmp_modifyDate ➔ XmpInformation.xmp_modify_date * XmpInformation.xmpMetadata ➔ XmpInformation.xmp_metadata * XmpInformation.xmpmm_documentId ➔ XmpInformation.xmpmm_document_id * XmpInformation.xmpmm_instanceId ➔ XmpInformation.xmpmm_instance_id - `PyPDF2.generic`: * readHexStringFromStream ➔ read_hex_string_from_stream * initializeFromDictionary ➔ initialize_from_dictionary * createStringObject ➔ create_string_object * TreeObject.hasChildren ➔ TreeObject.has_children * TreeObject.emptyTree ➔ TreeObject.empty_tree New Features (ENH): - Add decrypt support for V5 and AES-128, AES-256 (R5 only) (#749) Robustness (ROB): - Fix corrupted (wrongly) linear PDF (#1008) Maintenance (MAINT): - Move PDF_Samples folder into ressources - Fix typos (#1007) Testing (TST): - Improve encryption/decryption test (#1009) - Add merger test cases with real PDFs (#1006) - Add mutmut config Code Style (STY): - Put pure data mappings in separate files (#1005) - Make encryption module private, apply pre-commit (#1010) Full Changelog: 2.2.1...2.3.0

xilopaint · 2022-06-19T11:26:08Z

Is there any encrypted PDF file available that I could use for testing this PR? I mean some file that I couldn't decrypt before and now it's possible.

pubpub-zz · 2022-06-19T11:33:15Z

@xilopaint, test samples are included in this PR for autotest

MartinThoma · 2022-06-19T11:35:23Z

@xilopaint You can also easily create them with qpdf:

$ qpdf --encrypt foo bar 256 --force-R5 -- resources/crazyones.pdf crazyones-256.pdf

and then run

from PyPDF2 import PdfReader

reader = PdfReader("crazyones-256.pdf", password="foo")
print(reader.pages[0].extract_text())

If you checkout 2.2.1 you will get:

Traceback (most recent call last):
  File "foo.py", line 3, in <module>
    reader = PdfReader("crazyones-256.pdf", password="foo")
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 262, in __init__
    if password is not None and self.decrypt(password) == 0:
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 1610, in decrypt
    return self._decrypt(password)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 1651, in _decrypt
    f"only algorithm code 1 and 2 are supported. This PDF uses code {encrypt_v}"
NotImplementedError: only algorithm code 1 and 2 are supported. This PDF uses code 5

xilopaint · 2022-06-19T12:42:26Z

@MartinThoma the v2.3.0 is broken for me. _codecs is not being installed by pip and I'm getting ModuleNotFoundError: No module named 'PyPDF2._codecs'. To make it work I had to clone this repo and copy/paste the _codecs folder.

Also, for using the new enhanced decryption I had to install pycryptodome because it was not included as a dependency.

MartinThoma · 2022-06-19T12:57:28Z

@xilopaint Thank you for pointing it out - I just fixed it and released PyPDF2==2.3.1

pycryptodome is an optional dependency which you can install via pip install PyPDF2[crypto] (depending on your shell, you might need to escape the [ and ]).

xilopaint · 2022-06-19T13:09:16Z

(depending on your shell, you might need to escape the [ and ])

Yeah, I had tried pip install pypdf2[crypto] before and it didn't work in my zsh shell. Now using double quotes it does work.

What's the reason for pycryptodome being an optional dependency? Just making the package more lightweight for people who don't need the enhanced encryption?

MartinThoma · 2022-06-19T14:20:45Z

What's the reason for pycryptodome being an optional dependency? Just making the package more lightweight for people who don't need the enhanced encryption?

Yes exactly! Quite a lot of users don't need any encryption / decryption capabilities.

Additionally, pycryptodome is not a pure-python dependency. That means if it was a non-optional dependency, it might make it for some cases way harder / impossible to install / use

xilopaint · 2022-06-19T16:46:41Z

@MartinThoma my tests are failing in GitHub Actions since I've installed pypdf2[crypto]. Look:

https://github.com/xilopaint/alfred-pdf-tools/runs/6956399382?check_suite_focus=true

Although they're passing locally with no errors. Could you help me out?

MartinThoma · 2022-06-19T17:01:04Z

I just had a quick glance, but it seems like a Linux kernel module is missing. That is exactly the kind of reason why I wanted this dependency to be optional.

If you don't install the crypto part, the pure-python implementation is used. That still offers encryption / decryption, but only older algorithms

xilopaint · 2022-10-23T12:03:02Z

Hey @MartinThoma, what does R5 mean in the context of cryptography?

MartinThoma · 2022-10-23T12:30:35Z

That is a PDF specific thing. It's short for "revision". You can read more in the specs in the "Standard Encryption Dictionary" chapter.

xilopaint · 2022-10-23T12:32:28Z

Thanks!

MartinThoma added this to the PyPDF2 version 2.0.0 milestone Apr 14, 2022

MartinThoma added Feature Large labels Apr 15, 2022

exiledkingcc added 6 commits April 16, 2022 23:12

decrypt support V4 and AES-128

721b5b6

NOTICE: 1. Python3 only 2. need PyCryptodome for AES

fix and update test

2f8e33a

add pure python AES

26af6ea

FIX: allow use owner password to decrypt

46e1da3

FIX: merge encrypted pdf

970125e

decrypt support V=5 and R=5, which uses AES-256

47cfbd5

Merge branch 'master' into encryption

28cd603

exiledkingcc changed the title ~~decrypt support V4 and AES-128~~ decrypt support V5 and AES-128, AES-256(R5 only) Apr 18, 2022

MasterOdin reviewed Apr 18, 2022

View reviewed changes

remove AES code for easier maintaining

84532bb

add pycryptodome to extras_require

87bf5b1

exiledkingcc added 3 commits April 24, 2022 16:09

Merge branch 'master' into encryption

0348842

allow decrypt password to be bytes

fa439c3

Merge branch 'master' into encryption

63788f6

exiledkingcc added 3 commits April 27, 2022 21:47

Merge branch 'master' into encryption

1425f65

Merge branch 'master' into encryption

ba45481

make flake8 happy

2cfece2

MartinThoma added 3 commits June 16, 2022 14:13

Merge branch 'main' into encryption

45de13f

Merge branch 'main' into encryption

49125b5

Merge branch 'main' into encryption

e201761

MartinThoma changed the title ~~ENH: decrypt support V5 and AES-128, AES-256 (R5 only)~~ ENH: Add decrypt support for V5 and AES-128, AES-256 (R5 only) Jun 19, 2022

Merge branch 'main' into encryption

d628afe

MartinThoma reviewed Jun 19, 2022

View reviewed changes

PyPDF2/encryption.py Outdated Show resolved Hide resolved

MartinThoma reviewed Jun 19, 2022

View reviewed changes

PyPDF2/encryption.py Outdated Show resolved Hide resolved

MartinThoma added 2 commits June 19, 2022 08:16

Add pragma no-cover to base class

6172a51

Merge branch 'main' into encryption

8cc4a89

MartinThoma merged commit 868f977 into py-pdf:main Jun 19, 2022

MartinThoma removed the soon PRs that are almost ready to be merged, issues that get solved pretty soon label Jun 19, 2022

MartinThoma mentioned this pull request Jun 19, 2022

STY: Make encryption module private, apply pre-commit #1010

Merged

MartinThoma added a commit that referenced this pull request Jun 19, 2022

STY: Make encryption module private, apply pre-commit (#1010)

797963a

Related to #749

exiledkingcc deleted the encryption branch June 20, 2022 03:38

exiledkingcc mentioned this pull request May 3, 2023

Decrypting pdf owner password fails - treats /O value as unicode #557

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add decrypt support for V5 and AES-128, AES-256 (R5 only) #749

ENH: Add decrypt support for V5 and AES-128, AES-256 (R5 only) #749

exiledkingcc commented Apr 14, 2022

exiledkingcc commented Apr 15, 2022

MartinThoma commented Apr 15, 2022

MartinThoma commented Apr 15, 2022

exiledkingcc commented Apr 18, 2022

MasterOdin Apr 18, 2022

exiledkingcc Apr 19, 2022

MasterOdin Apr 20, 2022

exiledkingcc Apr 20, 2022

MartinThoma Apr 20, 2022

exiledkingcc Apr 20, 2022

MartinThoma commented Apr 21, 2022

MartinThoma commented Apr 21, 2022

exiledkingcc commented Apr 21, 2022

MartinThoma commented Apr 21, 2022

MartinThoma commented Apr 25, 2022 •

edited

Loading

MartinThoma commented Apr 25, 2022 •

edited

Loading

exiledkingcc commented Jun 15, 2022

MartinThoma Jun 19, 2022

exiledkingcc Jun 19, 2022

MartinThoma Jun 19, 2022

xilopaint commented Jun 19, 2022

pubpub-zz commented Jun 19, 2022

MartinThoma commented Jun 19, 2022

xilopaint commented Jun 19, 2022 •

edited

Loading

MartinThoma commented Jun 19, 2022 •

edited

Loading

xilopaint commented Jun 19, 2022

MartinThoma commented Jun 19, 2022

xilopaint commented Jun 19, 2022

MartinThoma commented Jun 19, 2022

xilopaint commented Oct 23, 2022

MartinThoma commented Oct 23, 2022

xilopaint commented Oct 23, 2022

ENH: Add decrypt support for V5 and AES-128, AES-256 (R5 only) #749

ENH: Add decrypt support for V5 and AES-128, AES-256 (R5 only) #749

Conversation

exiledkingcc commented Apr 14, 2022

exiledkingcc commented Apr 15, 2022

MartinThoma commented Apr 15, 2022

MartinThoma commented Apr 15, 2022

exiledkingcc commented Apr 18, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MartinThoma commented Apr 21, 2022

MartinThoma commented Apr 21, 2022

exiledkingcc commented Apr 21, 2022

MartinThoma commented Apr 21, 2022

MartinThoma commented Apr 25, 2022 • edited Loading

MartinThoma commented Apr 25, 2022 • edited Loading

exiledkingcc commented Jun 15, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xilopaint commented Jun 19, 2022

pubpub-zz commented Jun 19, 2022

MartinThoma commented Jun 19, 2022

xilopaint commented Jun 19, 2022 • edited Loading

MartinThoma commented Jun 19, 2022 • edited Loading

xilopaint commented Jun 19, 2022

MartinThoma commented Jun 19, 2022

xilopaint commented Jun 19, 2022

MartinThoma commented Jun 19, 2022

xilopaint commented Oct 23, 2022

MartinThoma commented Oct 23, 2022

xilopaint commented Oct 23, 2022

MartinThoma commented Apr 25, 2022 •

edited

Loading

MartinThoma commented Apr 25, 2022 •

edited

Loading

xilopaint commented Jun 19, 2022 •

edited

Loading

MartinThoma commented Jun 19, 2022 •

edited

Loading