AI Scraper Project

Welcome to the AI Scraper project repository! This project uses Python, Selenium, BeautifulSoup, and the Ollama language model to scrape, parse, and extract information from web pages.

Project Overview

The AI Scraper is designed to handle complex web scraping tasks including captcha solving, HTML parsing, and structured data extraction using advanced AI techniques.

Prerequisites

Before you begin, ensure you have the following installed:

Python 3.8 or higher
pip (Python package installer)

Installation

Clone the Repository:

git clone https://github.com/umutkayash/AI-Scraper.git
cd AI-Scraper

Set Up a Virtual Environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts�ctivate`

Install Dependencies:
```
pip install -r requirements.txt
```

Configuration

You will need to set environment variables for the Selenium WebDriver URL. You can do this by creating a .env file in the project root with the following content:
```
WEBDRIVER_URL="your_webdriver_url_here"
```

Usage

To run the scraper:

Activate your virtual environment if not already activated:

source venv/bin/activate  # On Windows use `venv\Scripts�ctivate`

Run the Scraper:
```
python main.py
```
Replace main.py with the script you wish to run.

How It Works

The AI Scraper performs the following steps:

Connects to a web page using Selenium.
Handles any captchas using configured settings.
Extracts HTML content and parses it using BeautifulSoup.
Segments the HTML content if necessary.
Uses the Ollama model to extract specific information based on user-defined criteria.

Contributing

Contributions are welcome! Please fork the repository and create a pull request with your changes.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

AI Scraper Project

Project Overview

Prerequisites

Installation

Configuration

Usage

How It Works

Contributing

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

AI Scraper Project

Project Overview

Prerequisites

Installation

Configuration

Usage

How It Works

Contributing

License