Skip to content

oelbourki/End-to-End-Ml-Boston-house-pricing

Repository files navigation

End-to-End Machine Learning: Boston House Price Prediction

This project is a practical demonstration of a complete machine learning pipeline, focusing on predicting house prices in Boston using the renowned Boston Housing dataset. It covers data preprocessing, exploratory analysis, model training, evaluation, and deployment as a user-friendly web application using Flask.

Table of Contents

Project Structure

├── app.py         # Flask app for model deployment
├── model.ipynb    # Jupyter notebook: data analysis, preprocessing, training & evaluation
├── regmodel.pkl    # Saved trained machine learning model
├── scaler.pkl     # Saved data scaler for consistent input transformation
├── requirements.txt # Project dependencies
├── Dockerfile     # Instructions for building a Docker image
├── Procfile       # Process management for deployment (e.g., on Heroku)
└── templates/     # HTML templates for the web app
    └── home.html   # Main page for user interaction

Key Features

  • End-to-end ML pipeline: Covers all stages from raw data to deployed model.
  • Missing value imputation: Employs KNN and MICE imputation to handle missing data effectively.
  • Exploratory Data Analysis (EDA): Uses visualizations and correlation analysis to gain insights.
  • Linear Regression model: A simple yet powerful model for predicting housing prices.
  • Model evaluation: Employs various metrics (MAE, MSE, RMSE, R-squared) for robust assessment.
  • Flask web app: Provides an interactive platform for users to input data and get predictions.
  • Dockerization: Simplifies deployment and ensures environment consistency.

Project Stages

1. Data Analysis and Preprocessing (model.ipynb)

  • Data Loading: Retrieves the Boston Housing dataset from a reliable source.
  • Exploratory Data Analysis (EDA):
    • Visualizes feature distributions using histograms, scatter plots, etc.
    • Calculates correlation between features and target variable to understand relationships.
    • Identifies potential multicollinearity (high correlation between independent features).
  • Missing Value Handling:
    • Utilizes KNN Imputation to fill missing values based on the nearest neighbors.
    • Applies MICE (Multivariate Imputation by Chained Equation) for robust imputation.
  • Data Splitting: Divides the data into training and testing sets for model building and validation.
  • Feature Scaling: Standardizes features using StandardScaler to ensure consistent scaling for the model.

2. Model Training and Evaluation (model.ipynb)

  • Model Selection: A Linear Regression model is chosen for its simplicity and interpretability.
  • Model Training: The model learns patterns from the training data to predict house prices.
  • Model Prediction: Predictions are made on the unseen test data to evaluate performance.
  • Performance Metrics:
    • Mean Absolute Error (MAE): Average absolute difference between predicted and actual values.
    • Mean Squared Error (MSE): Average squared difference between predicted and actual values.
    • Root Mean Squared Error (RMSE): Square root of MSE, providing a more interpretable scale.
    • R-squared (R²): Proportion of variance in the target variable explained by the model.
    • Adjusted R-squared: R² adjusted for the number of predictors, penalizing model complexity.
  • Residual Analysis: Examines residuals (differences between predicted and actual values) to assess model assumptions.

3. Model Deployment with Flask (app.py)

  • Flask App Creation: A simple Flask web application is created to serve the model.
  • Model Loading: The trained model and data scaler are loaded from pickle files for inference.
  • Routing:
    • /: Renders the main HTML template (home.html) for user interaction.
    • /predict: Handles POST requests, preprocesses input data, generates predictions, and sends them back to the user.
  • HTML Template (home.html):
    • Creates a form for users to input feature values.
    • Displays the model's prediction dynamically upon submission.

Getting Started

Prerequisites

  • Python 3.7+
  • pip (package installer for Python)

Installation

  1. Clone the repository:

    git clone https://github.com/your-username/End-to-End-Ml-Boston-house-pricing.git
    cd End-to-End-Ml-Boston-house-pricing
  2. Create a virtual environment (recommended):

    python3 -m venv venv 
    source venv/bin/activate 
  3. Install required libraries:

    pip install -r requirements.txt

Running the Application

  1. Run the Flask app:

    flask run
  2. Access the web application in your browser at http://127.0.0.1:5000/.

Future Enhancements

  • Experiment with More Models: Explore other regression algorithms (Ridge, Lasso, Random Forest) to potentially improve accuracy.
  • Feature Engineering: Engineer new features from existing ones to potentially enhance model performance.
  • Web App Enhancement:
    • Improve the user interface and design for a more engaging user experience.
    • Implement input validation to ensure data integrity.
  • Cloud Deployment: Deploy the application to a cloud platform (AWS, Heroku, GCP) for scalability and accessibility.
  • Model Monitoring: Implement mechanisms to monitor the model's performance over time and retrain as needed.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages