This sample project helps you get started with the Adobe PDF Services Python SDK.
The sample classes illustrate how to perform PDF-related actions (such as converting to and from the PDF format) using the SDK. Please note that the Adobe PDF Services Python SDK supports only server side use cases.
The sample application has the following requirements:
- Python : Version 3.10 or above. Python installation instructions can be found here.
The credentials file for the samples is pdfservices-api-credentials.json
.
Before the samples can be run, set the environment variables PDF_SERVICES_CLIENT_ID
and PDF_SERVICES_CLIENT_SECRET
from the pdfservices-api-credentials.json
file downloaded at the end of creation of credentials via Get Started workflow by running the following commands:
- For MacOS/Linux Users :
export PDF_SERVICES_CLIENT_ID=<YOUR CLIENT ID>
export PDF_SERVICES_CLIENT_SECRET=<YOUR CLIENT SECRET>
- For Windows Users :
SET PDF_SERVICES_CLIENT_ID=<YOUR CLIENT ID>
SET PDF_SERVICES_CLIENT_SECRET=<YOUR CLIENT SECRET>
The SDK supports setting up custom socket timeout or connect timeout for the API calls. Please refer this section to know more.
Additionally, SDK can be configured to process the documents in the specified region. Please refer this section section to know more.
If you receive ServiceUsageError during the Samples run, it means that trial credentials have exhausted their usage quota. Please contact us to get paid credentials.
Install the dependencies for the samples as listed in the requirements.txt
file with this command:
pip install -r requirements.txt
The SDK uses the Python standard logging module. Customize the logging settings as needed.
Default Logging Config:
logging.getLogger(__name__).addHandler(logging.NullHandler())
The following sub-sections describe how to run the samples. Prior to running the samples, check that the credentials file is set up as described above and that the project has been built.
The code itself is in the src
folder. Test files used by the samples can be found in resources/
. When executed, all samples create an output
child folder under the project root directory to store their results.
These samples illustrate how to convert files of supported formats to PDF. Refer the Create PDF API documentation to see the list of all supported media types which can be converted to PDF.
The sample class create_pdf_from_docx.py
creates a PDF file from a DOCX file.
python src/createpdf/create_pdf_from_docx.py
The sample class create_pdf_from_docx_with_options.py
creates a PDF file from a DOCX file by setting documentLanguage as
the language of input file.
python src/createpdf/create_pdf_from_docx_with_options.py
The sample class create_pdf_from_pptx.py
creates a PDF file from a PPTX file.
python src/createpdf/create_pdf_from_pptx.py
These samples illustrate how to convert HTML to PDF. Refer the HTML to PDF API documentation to see instructions on the structure of the zip file.
The sample class html_with_inline_css_to_pdf.py
creates a PDF file from an input HTML file with inline CSS.
python src/htmltopdf/html_with_inline_css_to_pdf.py
The sample class html_to_pdf_from_url.py
creates a PDF file from an HTML specified via URL.
python src/htmltopdf/html_to_pdf_from_url.py
The sample class static_html_to_pdf.py
creates a PDF file from a zip file containing the input HTML file and its resources.
python src/htmltopdf/static_html_to_pdf.py
The sample class dynamic_html_to_pdf.py
converts a zip file, containing the input HTML file and its resources, along
with the input data to a PDF file. The input data is used by the javascript in the HTML file to manipulate the HTML DOM,
thus effectively updating the source HTML file. This mechanism can be used to provide data to the template HTML
dynamically and then, convert it into a PDF file.
python src/htmltopdf/dynamic_html_to_pdf.py
These samples illustrate how to export PDF files to other formats. Refer Export PDF API documentation and Export PDF To Images API documentation for supported export formats.
The sample class export_pdf_to_docx.py
converts a PDF file to a DOCX file.
python src/exportpdf/export_pdf_to_docx.py
The sample class export_pdf_to_docx_with_ocr_option.py
converts a PDF file to a DOCX file. OCR processing is also performed on the input PDF file to extract text from images in the document.
python src/exportpdf/export_pdf_to_docx_with_ocr_option.py
The sample class export_pdf_to_jpeg.py
converts a PDF file's pages to a list of JPEG images.
python src/exportpdftoimages/export_pdf_to_jpeg.py
The sample class export_pdf_to_jpeg_zip.py
converts a PDF file's pages to JPEG images. The resulting file is a ZIP archive containing one image per page of the source PDF file
python src/exportpdftoimages/export_pdf_to_jpeg_zip.py
These samples illustrate how to combine multiple PDF files into a single PDF file.
The sample class combine_pdf.py
combines multiple PDF files into a single PDF file. The combined PDF file contains all pages
of the source files.
python src/combinepdf/combine_pdf.py
The sample class combine_pdf_with_page_ranges.py
combines specific pages of multiple PDF files into a single PDF file.
python src/combinepdf/combine_pdf_with_page_ranges.py
These samples illustrate how to apply OCR(Optical Character Recognition) to a PDF file and convert it to a searchable copy of your PDF. The supported input format is application/pdf.
The sample class ocr_pdf.py
converts a PDF file into a searchable PDF file.
python src/ocrpdf/ocr_pdf.py
The sample class ocr_pdf_with_options.py
converts a PDF file to a searchable PDF file with maximum fidelity to the original
image and default en-us locale. Refer to the documentation of OCRSupportedLocale and OCRSupportedType to see
the list of supported OCR locales and OCR types.
python src/ocrpdf/ocr_pdf_with_options.py
These samples illustrate how to reduce the size of a PDF file.
The sample class compress_pdf.py
reduces the size of a PDF file.
python src/compresspdf/compress_pdf.py
The sample class compress_pdf_with_options.py
reduces the size of a PDF file on the basis of provided compression level.
Refer to the documentation of CompressionLevel to see the list of supported compression levels.
python src/compresspdf/compress_pdf_with_options.py
The sample illustrates how to convert a PDF file into a Linearized (also known as "web optimized") PDF file. Such PDF files are optimized for incremental access in network environments.
The sample class linearize_pdf.py
optimizes the PDF file for a faster Web View.
python src/linearizepdf/linearize_pdf.py
These samples illustrate how to secure a PDF file with a password.
The sample class protect_pdf.py
converts a PDF file into a password protected PDF file.
python src/protectpdf/protect_pdf.py
The sample class protect_pdf_with_owner_password.py
secures an input PDF file with owner password and allows certain access permissions
such as copying and editing the contents, and printing of the document at low resolution.
python src/protectpdf/protect_pdf_with_owner_password.py
The sample illustrates how to remove a password security from a PDF document.
The sample class remove_protection.py
removes a password security from a secured PDF document.
python src/removeprotection/remove_protection.py
The sample illustrates how to rotate pages in a PDF file.
The sample class rotate_pdf_pages.py
rotates specific pages in a PDF file.
python src/rotatepages/rotate_pdf_pages.py
The sample illustrates how to delete pages in a PDF file.
The sample class delete_pdf_pages.py
removes specific pages from a PDF file.
python src/deletepages/delete_pdf_pages.py
The sample illustrates how to reorder the pages in a PDF file.
The sample class reorder_pdf_pages.py
rearranges the pages of a PDF file according to the specified order.
python src/reorderpages/reorder_pdf_pages.py
The sample illustrates how to insert pages in a PDF file.
The sample class insert_pdf_pages.py
inserts pages of multiple PDF files into a base PDF file.
python src/insertpages/insert_pdf_pages.py
The sample illustrates how to replace pages of a PDF file.
The sample class replace_pdf_pages.py
replaces specific pages in a PDF file with pages from multiple PDF files.
python src/replacepages/replace_pdf_pages.py
These samples illustrate how to split PDF file into multiple PDF files.
The sample class split_pdf_by_number_of_pages.py
splits input PDF into multiple PDF files on the basis of the maximum number
of pages each of the output files can have.
python src/splitpdf/split_pdf_by_number_of_pages.py
The sample class split_pdf_into_number_of_files.py
splits input PDF into multiple PDF files on the basis of the number
of documents.
python src/splitpdf/split_pdf_into_number_of_files.py
The sample class split_pdf_by_page_ranges.py
splits input PDF into multiple PDF files on the basis of page ranges.
Each page range corresponds to a single output file having the pages specified in the page range.
python src/splitpdf/split_pdf_by_page_ranges.py
Adobe Document Merge Operation allows you to produce high fidelity PDF and Word documents with dynamic data inputs. Using this operation, you can merge your JSON data with Word templates to create dynamic documents for contracts and agreements, invoices, proposals, reports, forms, branded marketing documents and more. To know more about document generation and document templates, please checkout the documentation
The sample class merge_document_to_docx.py
merges the Word based document template with the input JSON data to generate
the output document in the DOCX format.
python src/documentmerge/merge_document_to_docx.py
The sample class merge_document_to_docx_with_fragments.py
merges the Word based document template with the input JSON data and fragments JSON to generate
the output document in the DOCX format.
python src/documentmerge/merge_document_to_docx_with_fragments.py
The sample class merge_document_to_pdf.py
merges the Word based document template with the input JSON data to generate
the output document in the PDF format.
python src/documentmerge/merge_document_to_pdf.py
These samples illustrate how to perform electronic seal over PDF documents like agreements, invoices, proposals, reports, forms, branded marketing documents and more. To know more about PDF Electronic Seal, please see the documentation. The following details needs to updated while executing these samples: PROVIDER_NAME, ACCESS_TOKEN, CREDENTIAL_ID and PIN.
The sample class electronic_seal.py
uses the sealing options with default appearance options to apply electronic seal over the PDF document.
python src/electronicseal/electronic_seal.py
The sample class electronic_seal_with_appearance_options.py
uses the sealing options with custom appearance options to apply electronic seal over the PDF document.
python src/electronicseal/electronic_seal_with_appearance_options.py
The sample class electronic_seal_with_time_stamp_authority.py
uses a time stamp authority to apply electronic seal with trusted timestamp over the PDF document.
python src/electronicseal/electronic_seal_with_time_stamp_authority.py
These samples illustrate extracting content of PDF in a structured JSON format along with the renditions inside PDF. The output of SDK extract operation is Zip package. The Zip package consists of following:
- The structuredData.json file with the extracted content & PDF element structure. See the JSON schema. Please refer the Styling JSON schema for a description of the output when the styling option is enabled.
- A renditions' folder(s) containing renditions for each element type selected as input. The folder name is either “tables” or “figures” depending on your specified element type. Each folder contains renditions with filenames that correspond to the element information in the JSON file.
The sample class extract_text_info_from_pdf.py
extracts text elements from PDF document.
python src/extractpdf/extract_text_info_from_pdf.py
The sample class extract_text_table_info_from_pdf.py
extracts text, table elements from PDF document.
python src/extractpdf/extract_text_table_info_from_pdf.py
The sample class extract_text_table_info_with_renditions_from_pdf.py
extracts text, table elements along with table renditions
from PDF document. Note that the output is a zip containing the structured information along with renditions as described
in section.
python src/extractpdf/extract_text_table_info_with_renditions_from_pdf.py
The sample class extract_text_table_info_with_figures_tables_renditions_from_pdf.py
extracts text, table elements along with figure
and table element's renditions from PDF document. Note that the output is a zip containing the structured information
along with renditions as described in section.
python src/extractpdf/extract_text_table_info_with_figures_tables_renditions_from_pdf.py
The sample class extract_text_info_with_char_bounds_from_pdf.py
extracts text elements and bounding boxes for characters present in text blocks. Note that the output is a zip containing the structured information
along with renditions as described in section.
python src/extractpdf/extract_text_info_with_char_bounds_from_pdf.py
Extract Text, Table Elements and bounding boxes for Characters present in text blocks with Renditions of Table Elements
The sample class extract_text_table_info_with_char_bounds_from_pdf.py
extracts text, table elements, bounding boxes for characters present in text blocks and
table element's renditions from PDF document. Note that the output is a zip containing the structured information
along with renditions as described in section.
python src/extractpdf/extract_text_table_info_with_char_bounds_from_pdf.py
The sample class extract_text_table_info_with_table_structure_from_pdf.py
extracts text, table elements, table structures as CSV and
table element's renditions from PDF document. Note that the output is a zip containing the structured information
along with renditions as described in section.
python src/extractpdf/extract_text_table_info_with_table_structure_from_pdf.py
The sample class extract_text_table_info_with_styling_from_pdf.py
extracts text and table elements along with the styling information of the text blocks.
Note that the output is a zip containing the structured information
along with renditions as described in section.
python src/extractpdf/extract_text_table_info_with_styling_from_pdf.py
The sample class extract_text_from_pdf_exception_sample.py
highlights how to handle different types of exception. Place the invalid input pdf file in resources/invalidinputs folder.
python src/extractpdf/extract_text_from_pdf_exception_sample.py <input file name>
This sample illustrates how to fetch properties of a PDF file
The sample class get_pdf_properties.py
fetches the properties of an input PDF.
python src/pdfproperties/get_pdf_properties.py
These samples illustrate how to provide a custom client configurations(timeouts, proxy etc.).
The sample class create_pdf_with_custom_timeouts.py
highlights how to provide the custom value for connection timeout and socket timeout.
python src/customconfigurations/create_pdf_with_custom_timeouts.py
The sample class create_pdf_with_proxy_server.py
highlights how to provide Proxy Server configurations to allow all API calls via that proxy Server.
python src/customconfigurations/create_pdf_with_proxy_server.py
The sample class create_pdf_with_authenticated_proxy_server.py
highlights how to provide Proxy Server configurations to allow all API calls via that proxy Server that requires authentication.
python src/customconfigurations/create_pdf_with_authenticated_proxy_server.py
The sample class export_pdf_with_specified_region.py
highlights how to configure the SDK to process the documents in the specified region.
python src/customconfigurations/export_pdf_with_specified_region.py
These samples illustrate how to create a PDF document with enhanced readability from existing PDF document. All tags from the input file will be removed except for existing alt-text images and a new tagged PDF will be created as output. However, the generated PDF is not guaranteed to comply with accessibility standards such as WCAG and PDF/UA as you may need to perform further downstream remediation to meet those standards.
The sample project autotag_pdf.py
highlights how to add tags to PDF document to make the PDF more accessible.
python src/autotagpdf/autotag_pdf.py
The sample project autotag_pdf_with_options.py
highlights how to add tags to PDF documents to make the PDF more accessible and also shift the headings in the output PDF file.
Also, it generates a tagging report which contains the information about the tags that the tagged output PDF document contains.
python src/autotagpdf/autotag_pdf_with_options.py
The sample project autotag_pdf_parametrised.py
highlights how to add tags to PDF documents to make the PDF more accessible by setting options through command line arguments.
Here is a sample list of command line arguments and their description:
--input < input file path >
--output < output file path >
--report { If this argument is present then the output will be generated with the tagging report }
--shift_headings { If this argument is present then the headings will be shifted in the output PDF document }
python src/autotagpdf/autotag_pdf_parametrised.py --input src/resources/autotagPDFInput.pdf --output output/AutotagPDFParameterised/ --shift_headings --report
These samples illustrate how to use external input and output storage for the supported operations.
The sample class external_input_create_pdf_from_docx.py
creates a PDF file from a DOCX file stored at external storage.
python src/externalstorage/external_input_create_pdf_from_docx.py
Create a PDF File From a DOCX File Using External Input Storage and Store the Result in External Output Storage
The sample class external_input_and_output_create_pdf_from_docx.py
creates a PDF file from a DOCX file stored at external storage and stores the result in external output storage.
python src/externalstorage/external_input_and_output_create_pdf_from_docx.py
This samples illustrate how to check PDF files to see if they meet the machine-verifiable requirements of PDF/UA and WCAG 2.0.
The sample class pdf_accessibility_checker.py
checks the accessibility of an input PDF.
python src/pdfaccessibilitychecker/pdf_accessibility_checker.py
This sample class pdf_accessibility_checker_with_option.py
checks the accessibility of an input PDF for given page start and page end.
python src/pdfaccessibilitychecker/pdf_accessibility_checker_with_option.py
This sample illustrates how to add watermark to a PDF document.
The sample class pdf_watermark.py
adds watermark with default appearance options to apply watermark on the PDF document.
python src/pdfwatermark/pdf_watermark.py
This sample class pdf_watermark_with_options.py
adds watermark to a PDF document with custom watermark appearance option and page range options.
python src/pdfwatermark/pdf_watermark_with_options.py
Contributions are welcome! Read the Contributing Guide for more information.
This project is licensed under the MIT License. See LICENSE for more information.