README.md

Project Title

Web Scraper

Introduction

This README provides essential information for setting up and running the Web Scraper application. Before you begin, make sure you have the necessary prerequisites in place.

Prerequisites

Before you can run the Web Scraper application, you need to ensure you have the following prerequisites installed on your system:

Python (3.10 or higher)
pip (Python package manager)

To install the required Python packages, navigate to the project directory and run the following command:

pip install -r requirements.txt

Additionally, Web Scraper relies on the wkhtmltopdf tool for generating PDFs. If you are using Ubuntu, you can install it using the following command:

sudo apt-get install wkhtmltopdf

Usage

Before running the Web Scraper application, ensure you have the necessary prerequisites installed on your system. Refer to the "Prerequisites" section for details.

Scraping and Saving URLs as PDFs

To start scraping and recursively saving URLs as PDFs, use the following command:

python scraper.py <base url>

Replace <base url> with the URL you want to use as the root for your scraping task. The application will start from the provided URL and recursively follow links on the pages to save them as PDFs.

Scraping URLs from a File

You can also scrape URLs from a file and save them as PDFs. To do this, use the following command:

python scraper.py --urls-to-dump <file_name>

Replace <file_name> with the name of the file containing the list of URLs to be saved as PDFs.

Scraping and Saving URLs from a File as PDFs

To scrape and save URLs from a file as PDFs, use the following command:

python scraper.py --urls-to-dump <file_name>

Replace <file_name> with the name of the file containing a list of URLs to save as PDFs. The application will process each URL in the file and save them as PDFs without scraping additional reference links.

Examples

Here are a few example commands for running the Web Scraper application:

Scraping and Saving URLs as PDFs

python scraper.py https://example.com
python scraper.py https://anotherwebsite.com

Scraping and Saving URLs from a File as PDFs

Here is an example command for running the Web Scraper application to save URLs from a file as PDFs:

python scraper.py --urls-to-dump=my_urls.txt

In this example, the my_urls.txt file should contain a list of URLs you want to save as PDFs. The application will process each URL in the file and save them as PDFs.

Please adjust the command and file name to match your specific use case. Ensure you have installed the required Python packages and the wkhtmltopdf tool as mentioned in the "Prerequisites" section before running the application.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
src		src
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README.md

Project Title

Introduction

Prerequisites

Usage

Scraping and Saving URLs as PDFs

Scraping URLs from a File

Scraping and Saving URLs from a File as PDFs

Examples

Scraping and Saving URLs as PDFs

Scraping and Saving URLs from a File as PDFs

About

Releases

Packages

Contributors 3

Languages

Siva-Venigalla/web_scraping

Folders and files

Latest commit

History

Repository files navigation

README.md

Project Title

Introduction

Prerequisites

Usage

Scraping and Saving URLs as PDFs

Scraping URLs from a File

Scraping and Saving URLs from a File as PDFs

Examples

Scraping and Saving URLs as PDFs

Scraping and Saving URLs from a File as PDFs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages