Skip to content

PdfParser, a standalone PHP library, provides various tools to extract data from a PDF file.

License

Notifications You must be signed in to change notification settings

kyliemsauter/pdfparser

This branch is 27 commits behind smalot/pdfparser:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

unixnutJordan Hallk00ni
Dec 1, 2023
268a620 · Dec 1, 2023
Aug 22, 2023
Mar 10, 2023
Dec 1, 2023
Dec 1, 2023
Dec 1, 2023
Dec 1, 2023
Dec 15, 2020
Sep 22, 2021
Mar 10, 2023
Jun 5, 2023
Aug 3, 2021
Jan 3, 2017
Apr 26, 2021
Mar 11, 2022
Aug 16, 2021
Mar 10, 2023
Apr 19, 2021
Mar 10, 2023
Mar 10, 2023

Repository files navigation

PDF parser

Version CI CS Scrutinizer Code Quality Downloads

The smalot/pdfparser is a standalone PHP package that provides various tools to extract data from PDF files.

This library is under active maintenance. There is no active development by the author of this library (at the moment), but we welcome any pull request adding/extending functionality!

Features

  • Load/parse objects and headers
  • Extract metadata (author, description, ...)
  • Extract text from ordered pages
  • Support of compressed PDFs
  • Support of MAC OS Roman charset encoding
  • Handling of hexa and octal encoding in text sections
  • Create custom configurations (see CustomConfig.md).

Currently, secured documents and extracting form data are not supported.

License

This library is under the LGPLv3 license.

Install

This library requires PHP 7.1+ since v1. You can install it via Composer:

composer require smalot/pdfparser

In case you can't use Composer, you can include alt_autoload.php-dist. It will include all required files automatically.

Quick example

<?php

// Parse PDF file and build necessary objects.
$parser = new \Smalot\PdfParser\Parser();
$pdf = $parser->parseFile('/path/to/document.pdf');

$text = $pdf->getText();
echo $text;

Further usage information can be found here.

Documentation

Documentation can be found in the doc folder.

About

PdfParser, a standalone PHP library, provides various tools to extract data from a PDF file.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • PHP 99.9%
  • Makefile 0.1%