This project aims to predict student performance and provide educational interventions based on influential factors. We use the Student Performance Data Set, combining math and Portuguese language datasets to build a predictive model and offer tailored recommendations.
- Installation
- Usage
- Project Overview
- Model Training
- Evaluation
- Feature Importance
- Educational Interventions
- Example Usage
- Mathematical Concepts and Formulas
- Clone the repository:
git clone https://github.com/yourusername/student-performance-prediction.git
- Navigate to the project directory:
cd student-performance-prediction
- Install the required packages:
pip install -r requirements.txt
- Ensure you have the dataset files (
student-mat.csv
andstudent-por.csv
) in the project directory. - Run the script:
python main.py
This project involves the following steps:
- Loading and combining the student performance datasets.
- Cleaning and preprocessing the data.
- Splitting the data into training and testing sets.
- Training a RandomForestRegressor model to predict student performance.
- Evaluating the model's performance using Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE).
- Analyzing feature importance to understand influential factors.
- Providing educational interventions based on model predictions.
The model training involves data loading, preprocessing, feature and label separation, train-test split, and training the RandomForestRegressor model.
Evaluate the model using MAE and RMSE to measure its performance.
Analyze feature importance to understand the influential factors that affect student performance.
Provide tailored recommendations based on influential features identified during the model training process.
Example of recommending interventions for a student, showcasing how the model's predictions can be used to provide personalized educational support.
The Mean Absolute Error is calculated as:
where ( y_i ) is the actual value and ( \hat{y}_i ) is the predicted value.
The Root Mean Squared Error is calculated as:
where ( y_i ) is the actual value and ( \hat{y}_i ) is the predicted value.
Random Forest is an ensemble learning method that operates by constructing multiple decision trees during training and outputting the mean prediction of the individual trees. The model combines the predictions from many decision trees to improve the overall prediction accuracy and control overfitting.
This project is licensed under the MIT License. See the LICENSE file for details.