Skip to content

Using classical and ML tools to feature engineering to build explainable and robust model for unbiased Recidivism Prediction

License

Notifications You must be signed in to change notification settings

shreshthsaini/UBR-UnBiased-and-Robust-Recidivism-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UBR-UnBiased-and-Robust-Recidivism-Prediction

Addressing Algorithmic Bias in Recidivism Score Predictions

Overview

This project aims to mitigate the inherent bias in recidivism score predictions generated by the National Institute of Justice. The existing algorithms tend to exhibit biases towards gender and racial/ethnic groups, which can have profound implications on the lives of individuals affected by these predictions. We are leveraging explainable machine learning techniques to rectify and minimize these biases.

Problem Statement

Recidivism scores play a crucial role in the criminal justice system, but they can perpetuate societal biases. This project aims to develop a fair and unbiased algorithmic decision-making model, focusing on gender and racial/ethnic groups.

Key Features

  • Bias Mitigation Techniques: Utilizing cutting-edge machine learning techniques to identify and mitigate biases in recidivism score predictions.

  • Fairness Evaluation Metrics: Implementing rigorous fairness evaluation metrics to assess the model's performance across different demographic groups.

  • Transparency and Explainability: Prioritizing transparency in the model's decision-making process and providing explanations for predictions to enhance accountability.

Getting Started

  • Data: All the datasets used in this study are accessible within the data folder. The primary datasets are identified with the prefix NIJ_s_Recidivism_Challenge, comprising three test datasets and one training dataset. The Recidivism_Full_Dataset.csv consolidates these datasets by incorporating an additional column specifying whether each entry belongs to the training or test set. Furthermore, there are two cleaned versions of the dataset available: Recidivism_Data_Cleaned.csv and Recidivism_Full_Dataset_cleaned_shreshth.csv. These cleaned datasets represent the result of preprocessing steps.

  • Analysis: All models developed either in jupiter notebook are available in Modeling folder:

  1. Data Preparation: Refer to data_prep.ipynb for steps on cleaning and preparing the dataset.

  2. Model Development: Explore Models.ipynb for a detailed walkthrough of developing and evaluating machine learning models such as Logistic Regression, KNN, SVM, XGBoost, and CatBoost.

  3. Neural Network Models: Refer to nn_model_all.ipynb for insights into MLP neural network models.

  4. Logistic Regression Table: Execute Logistic Regression Table.do in STATA for detailed logistic regression results in tabular format for impact evaluation of different variables on the odds of recidisvism.

  5. CatBoost Model: Catboost gives the best performance. Utilize /Modeling/trained_model/best_classifier_CatBoost.joblib for the best-trained CatBoost machine learning model.

  6. CatBoost Information: Access additional information related to the CatBoost algorithm in the /Modeling/catboost_info folder.

  7. Trained Models: Find stored trained models in the /Modeling/trained_models/.

  8. Raw model runs: Review training steps and predictions in the /Modeling/tacc/ folder.

Contribution

Feel free to contribute, report issues, or suggest improvements. We welcome collaboration to enhance the robustness of recidivism prediction models.

Acknowledgments

This project was undertaken as a requirement for the Applied Machine Learning course offered by the Department of Electrical and Computer Engineering at the University of Texas at Austin, under the guidance of Professor Ghosh. The knowledge and skills gained in this course have been instrumental in the successful execution of this study.

About

Using classical and ML tools to feature engineering to build explainable and robust model for unbiased Recidivism Prediction

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published