Skip to content

Commit

Permalink
🐛 with extract-annotated-pages command
Browse files Browse the repository at this point in the history
  • Loading branch information
wolfram77 committed Feb 6, 2025
1 parent 7518a6c commit 575f137
Show file tree
Hide file tree
Showing 3 changed files with 82 additions and 24 deletions.
49 changes: 25 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,23 +26,24 @@ $ pdfly --help

pdfly is a pure-python cli application for manipulating PDF files.

╭─ Options ───────────────────────────────────────────────────────────────────╮
│ --version │
│ --help Show this message and exit. │
╰─────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ──────────────────────────────────────────────────────────────────╮
│ 2-up Create a booklet-style PDF from a single input. │
│ cat Concatenate pages from PDF files into a single PDF file. │
│ compress Compress a PDF. │
| uncompress Uncompresses a PDF. │
│ extract-images Extract images from PDF without resampling or altering. │
│ extract-text Extract text from a PDF file. │
│ meta Show metadata of a PDF file │
│ pagemeta Give details about a single page. │
│ rm Remove pages from PDF files. │
│ update-offsets Updates offsets and lengths in a simple PDF file. │
│ x2pdf Convert one or more files to PDF. Each file is a page. │
╰─────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────╮
│ --version │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ───────────────────────────────────────────────────────────────────────────╮
│ 2-up Create a booklet-style PDF from a single input. │
│ cat Concatenate pages from PDF files into a single PDF file. │
│ compress Compress a PDF. │
| uncompress Uncompresses a PDF. │
| extract-annotated-pages Extract only the annotated pages from a PDF. |
│ extract-images Extract images from PDF without resampling or altering. │
│ extract-text Extract text from a PDF file. │
│ meta Show metadata of a PDF file │
│ pagemeta Give details about a single page. │
│ rm Remove pages from PDF files. │
│ update-offsets Updates offsets and lengths in a simple PDF file. │
│ x2pdf Convert one or more files to PDF. Each file is a page. │
╰──────────────────────────────────────────────────────────────────────────────────────╯
```

You can see the help of every subcommand by typing:
Expand All @@ -56,13 +57,13 @@ $ pdfly 2-up --help
Pairs of two pages will be put on one page (left and right)
usage: python 2-up.py input_file output_file

╭─ Arguments ─────────────────────────────────────────────────────────────────╮
│ * pdf PATH [default: None] [required] │
│ * out PATH [default: None] [required] │
╰─────────────────────────────────────────────────────────────────────────────╯
╭─ Options ───────────────────────────────────────────────────────────────────╮
│ --help Show this message and exit. │
╰─────────────────────────────────────────────────────────────────────────────╯
╭─ Arguments ──────────────────────────────────────────────────────────────────────────
│ * pdf PATH [default: None] [required]
│ * out PATH [default: None] [required]
╰──────────────────────────────────────────────────────────────────────────────────────
╭─ Options ────────────────────────────────────────────────────────────────────────────
│ --help Show this message and exit.
╰──────────────────────────────────────────────────────────────────────────────────────
```

## Contributors ✨
Expand Down
25 changes: 25 additions & 0 deletions pdfly/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
import pdfly.booklet
import pdfly.cat
import pdfly.compress
import pdfly.extract_annotated_pages
import pdfly.extract_images
import pdfly.metadata
import pdfly.pagemeta
Expand Down Expand Up @@ -319,3 +320,27 @@ def x2pdf(
exit_code = pdfly.x2pdf.main(x, output)
if exit_code:
raise typer.Exit(code=exit_code)


@entry_point.command(name="extract-annotated-pages", help=pdfly.extract_annotated_pages.__doc__) # type: ignore[misc]
def extract_annotated_pages(
input_pdf: Annotated[
Path,
typer.Argument(
dir_okay=False,
exists=True,
resolve_path=True,
help="Input PDF file.",
),
],
output_pdf: Annotated[
Optional[Path],
typer.Option(
"--output",
"-o",
writable=True,
help="Output PDF file. Defaults to 'input_pdf_annotated'.",
),
] = None,
) -> None:
pdfly.extract_annotated_pages.main(input_pdf, output_pdf)
32 changes: 32 additions & 0 deletions pdfly/extract_annotated_pages.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
"""
Extract only the annotated pages from a PDF.
Q: Why does this help?
A: https://github.com/py-pdf/pdfly/issues/97
"""

from pathlib import Path
from pypdf import PdfReader, PdfWriter


# Check if an annotation is manipulable.
def is_manipulable(annot) -> bool:
return annot.get("/Subtype") not in ["/Link"]


# Main function.
def main(input_pdf: Path, output_pdf: Path) -> None:
if not output_pdf:
output_pdf = input_pdf.with_stem(input_pdf.stem + "_annotated")
input = PdfReader(input_pdf)
output = PdfWriter()
output_pages = 0
# Copy only the pages with annotations
for page in input.pages:
if not "/Annots" in page: continue
if not any(is_manipulable(annot) for annot in page["/Annots"]): continue
output.add_page(page)
output_pages += 1
# Save the output PDF
output.write(output_pdf)
print(f"Extracted {output_pages} pages with annotations to {output_pdf}")

0 comments on commit 575f137

Please sign in to comment.