This project involves predicting whether a user will make a purchase based on their age and salary using social media ad purchases data. The model leverages PySpark to build and evaluate machine learning pipelines with Decision Trees and Random Forests, achieving 90% accuracy.
The goal of this project is to analyze social media ad purchases data and predict user behavior. The dataset includes features such as age and salary, and the task is to predict whether a user will buy a product.
- PySpark: A Python library for Apache Spark that provides an interface for programming Spark with Python.
- Decision Trees: A decision support tool that uses a tree-like graph of decisions and their possible consequences.
- Random Forests: An ensemble learning method that constructs multiple decision trees and merges them together to improve accuracy and control over-fitting.
To run this project, you need to have PySpark installed. You can install PySpark using pip:
pip install pyspark