The Netflix Userbase Analytics project delivers a comprehensive analysis of Netflix's user behavior and content trends using publicly available datasets. By leveraging modern data engineering and analytics tools, the project aims to uncover actionable insights into customer preferences, subscription patterns, and content popularity. These insights can help OTT platforms like Netflix make data-driven decisions to optimize their services and grow their userbase.
This project addresses key questions for Netflix’s business strategy, including:
- Understanding subscription patterns:
- Subscription plans by user demographics.
- Monthly revenue generation trends.
- Analyzing content consumption:
- Device usage for streaming.
- Most-watched genres, directors, and actors.
- Assessing Netflix’s growth areas:
- Preferences between movies and TV shows.
- Regional content performance.
-
Data Sources:
-
Data Collection & Storage:
- Raw data is stored in AWS S3 buckets for centralized management.
-
Data Transformation & Extraction:
- AWS Glue and PySpark are used to clean, transform, and enrich data.
-
Data Loading:
- Transformed data is loaded into an AWS RDS (PostgreSQL) data warehouse.
-
Visualization:
- Interactive dashboards created in Tableau visualize key insights and KPIs.
-
S3 Buckets:
- CSV files are organized into folders within S3 buckets.
-
AWS Glue ETL Jobs:
- Crawlers scan datasets to populate the AWS Glue Data Catalog.
- PySpark scripts handle dimension and fact table transformations.
- Data Warehouse:
- Processed data is stored in a RDS star-schema format for OLAP analysis.
- Automation:
- AWS Glue workflows automate the ETL process end-to-end.
- Dashboards:
- Tableau dashboards showcase key insights like user preferences and revenue trends.
This project equips stakeholders with actionable insights into Netflix's userbase and content strategy. By leveraging modern data analytics and cloud technologies, it demonstrates how data-driven decision-making can enhance customer satisfaction and business growth.