Skip to content

Latest commit

 

History

History
31 lines (23 loc) · 925 Bytes

README.md

File metadata and controls

31 lines (23 loc) · 925 Bytes

Invalid CMap table generated

This is a repro for an unrecoverable UTF-8 in a PDF generated by weasyprint.

Steps for reproduction:

  1. weasyprint example.html example.pdf

  2. cp example.pdf example_fixed.pdf

  3. Use hex editor to fix the cmap table of example_fixed.pdf by changing
<10006c5b> <6c5b>

to

<00006c5b> <6c5b>
  1. ./p2t.py

  2. See that the character has been recovered in example_fixed.txt but not in example.txt

Step 3 is not necessary anymore after Kozea/WeasyPrint#1571 (comment) is fixed.

Requirements

This needs the python packages weasyprint (version 54.1) and pdftotext installed.

If you have a working nix setup use the provided default.nix by calling

nix-shell

If you have direnv and nix, just use

direnv allow