-
Kaggle data set The movies data base contain over 10,000 movies contain sevral information about New columns:
homepage id original_title overview popularity production_companies production_countries release_date spoken_languages status tagline vote_average
-
the CSV files could be found Here
-
The notebook was build genreally on the genres column to create a well polished EDA graphs using matplotlib package
import matplotlib.pyplot as plt
-
The secound task is to create a movie recommendation system based on the movies generes using sklearn Kmean algorithm
from sklearn.cluster import KMeans
You are most wellcome to fork my notebook and update my code , below some inspiration points could be worked on :
- Can you categorize the films by type, such as animated or not? We don't have explicit labels for this, but it should be possible to build them from the crew's job titles.
- How sharp is the divide between major film studios and the independents? Do those two groups fall naturally out of a clustering analysis or is something more complicated going on?
The original notebook was build on kaggel karnel please visit my notebook Here and upvote if you found some thing useful.