Skip to content

Athena75/pdf2text

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PDF to Text and Table Extraction

This Python script extracts text and tables from a PDF file.

Requirements

  • Python 3.x

Usage

  1. Place the PDF file you want to process in the specified location data/raw.
  2. Update the pdf_path variable in the script with the path to your PDF file.
  3. Set the output_folder variable to the desired folder to save the extracted CSV files and text file: data/processed.
  4. Run the script:
python src/main.py

Output

  • The script will generate a text file containing the extracted text from the PDF, saved in the specified output folder.
  • Separate CSV files will be created for each table found in the PDF, named with the PDF's stem (filename without extension) and table number.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages