PDF to Text and Table Extraction

This Python script extracts text and tables from a PDF file.

Requirements

Place the PDF file you want to process in the specified location data/raw.
Update the pdf_path variable in the script with the path to your PDF file.
Set the output_folder variable to the desired folder to save the extracted CSV files and text file: data/processed.
Run the script:

python src/main.py

The script will generate a text file containing the extracted text from the PDF, saved in the specified output folder.
Separate CSV files will be created for each table found in the PDF, named with the PDF's stem (filename without extension) and table number.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt