GitHub - Fixkey/Movie-analyzer: Movie Data Aggregator and Analyzer using python and TF-IDF weighting

Project structure

data - Source data
- douban-data.txt - Douban movie list (Chinese)
- filmweb_1939_2023.csv - Filmweb movie list (Polish)
- imdb-data.csv - Imdb movie list (English)
results - Processed data
- ratings.csv - Combined, cleaned and transformed data from 3 movie sources
- plots.json - Scrapped imdb plot description for movies included in ratings
- cluster_terms.csv - Clusters produced from TF-IDF Transformation of plots
- id_to_cluster.csv - Association of plots with their cluster_terms
ratings.ipynb - Cleaning, transforming and combining ratings of 3 movie databases
scraper.ipynb - Web scraper using scrapy library to fetch the latest plot description of movies from IMDB.com
plot-analyzer.ipynb - Cleans the plots data, uses TF-IDF transformation on texts, groups clusters using k-means algorithm
dashboard.ipynb - A place to visualize the results of ratings

Requirements

python3 (tested on 3.13.1)
Jupiter notebook/Visual Studio Code (anything that opens .ipynb files)
pandas
scrapy
nltk
scikit-learn

How to run

Install python3 and requirements using pip install, then execute all .ipynb files in this order:
ratings -> scraper -> plot-analyzer -> dashboard

Sources

filmweb: https://www.kaggle.com/code/kfc667/quick-filmweb-eda
imdb: https://www.kaggle.com/datasets/octopusteam/full-imdb-dataset
douban: https://www.kaggle.com/datasets/fengzhujoey/douban-datasetratingreviewside-information
scraping source: https://imdb.com/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project structure

Requirements

How to run

Sources

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
results		results
README.md		README.md
dashboard.ipynb		dashboard.ipynb
plot-analyzer.ipynb		plot-analyzer.ipynb
ratings.ipynb		ratings.ipynb
scraper.ipynb		scraper.ipynb

Fixkey/Movie-analyzer

Folders and files

Latest commit

History

Repository files navigation

Project structure

Requirements

How to run

Sources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages