Feature proposition: list text objects, their size and location found in a pdf file #92

d-ph · 2024-08-12T10:12:40Z

Hello,

Similar to how cpdf can list images with the -image-resolution operation, would it be possible to add a cpdf operation that lists text object (most importantly: their size and location) found in a pdf?

The caveat being that "text that has been converted to vector outlines" would not be detected by that new cpdf operation, which is understandable.

Regards.

The text was updated successfully, but these errors were encountered:

johnwhitington · 2024-08-22T08:22:10Z

There are two tasks here:

Parse PDF page content to locate objects on the page; and
Do PDF text extraction.

The first will be coming soon. The second will happen, but only for well-behaved modern PDFs. I don't want to get into the full field of PDF text extraction - it's a complex thing.

d-ph · 2024-08-23T08:49:00Z

Understood and fair. Thanks for the information and explanation 👍

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature proposition: list text objects, their size and location found in a pdf file #92

Feature proposition: list text objects, their size and location found in a pdf file #92

d-ph commented Aug 12, 2024

johnwhitington commented Aug 22, 2024

d-ph commented Aug 23, 2024

Feature proposition: list text objects, their size and location found in a pdf file #92

Feature proposition: list text objects, their size and location found in a pdf file #92

Comments

d-ph commented Aug 12, 2024

johnwhitington commented Aug 22, 2024

d-ph commented Aug 23, 2024