🎬 Douban Movie Reviews Scraper

A powerful tool for collecting and analyzing Douban movie reviews

🚀 Features

🔄 Robust scraping with rate limiting and retry mechanisms
🧹 Advanced data cleaning and normalization
📊 Sentiment analysis categorization
💾 Efficient CSV export functionality
🔍 Comprehensive error handling and logging
🛡️ Built-in protection against API rate limits
📝 Detailed comment metadata extraction
🎯 Configurable scraping parameters

🛠️ Requirements

Python 3.8+

Required packages:

beautifulsoup4==4.12.3
numpy==2.1.3
pandas==2.2.3
python-dateutil==2.9.0.post0
pytz==2024.2
requests~=2.32.3
six==1.16.0
soupsieve==2.6
tzdata==2024.2

📦 Installation

Clone the repository:

git clone https://github.com/ChanMeng666/douban-review-scraper.git

Navigate to the project directory:

cd douban-review-scraper

Install dependencies:

pip install -r requirements.txt

⚙️ Configuration

Edit config.py to customize your scraping parameters:

MOVIE_ID = 'your_movie_id'  # Douban movie ID
MAX_PAGES = 50              # Maximum pages to scrape
REQUEST_TIMEOUT = 30        # Request timeout in seconds
RETRY_TIMES = 3            # Number of retry attempts

🚀 Usage

Configure your target movie ID in config.py
Run the scraper:

python main.py

📊 Output Format

The scraper generates CSV files containing:

timestamp: Comment timestamp
content: Cleaned comment text
rating: User rating (1-5)
user_id: Douban user ID
category: Comment category (positive/negative/neutral)

⚠️ Important Notes

Respect Douban's robots.txt and API limitations
Update cookies periodically for reliable operation
Consider using proxies for large-scale scraping
Check Douban's terms of service before use

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. Here's how you can contribute:

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👥 Author

Chan Meng

LinkedIn: chanmeng666
GitHub: ChanMeng666

🌟 Show your support

Give a ⭐️ if this project helped you!

Made with ❤️ by Chan Meng

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github		.github
output		output
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
config.py		config.py
data_processor.py		data_processor.py
main.py		main.py
requirements.txt		requirements.txt
scraper.py		scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎬 Douban Movie Reviews Scraper

🚀 Features

🛠️ Requirements

📦 Installation

⚙️ Configuration

🚀 Usage

📊 Output Format

⚠️ Important Notes

🤝 Contributing

📄 License

👥 Author

🌟 Show your support

About

Releases

Sponsor this project

Packages

Languages

License

ChanMeng666/douban-review-scraper

Folders and files

Latest commit

History

Repository files navigation

🎬 Douban Movie Reviews Scraper

🚀 Features

🛠️ Requirements

📦 Installation

⚙️ Configuration

🚀 Usage

📊 Output Format

⚠️ Important Notes

🤝 Contributing

📄 License

👥 Author

🌟 Show your support

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Languages

Packages