GitHub - lbventura/Churn-Prediction-Modeling: A step-by-step exposition of how to build a bank customer churn prediction model.

A step-by-step exposition of how to build a bank customer churn prediction model. We have provided the reader with considerable detail for pedagogical purposes. Inference is also carried out by using several statistical tests and Logistic Regression.

The main conclusions are:

The most important features for churn prediction are Age, NumOfProducts, EstimatedSalary, CreditScore, Balance, Geography and Gender.
Assuming that the bank's priority is to retain customers (over the incurred costs), our goal is to minimize the number of false negatives and maximize the number of true positives, even if at the expense of false positives. Therefore, we change the target metric to $F_2$. The best performing model under (k=5) cross-validation was Support Vector Classifier with f2-score equal to 0.75. The best performing logistic regression had a f2-score of 0.68, compared to the naïve benchmark score of 0.67. Both models, however, predict far less false negatives.
Restricting the analysis to the relevant features identified by the Kolmogorov-Smirnov and Chi-Squared tests leads to a slight underperformance of the SVC but not of the logistic regression.
The model with the engineered features does not improve performance.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
utils_bin		utils_bin
.gitignore		.gitignore
BankChurnersCodeAlong.ipynb		BankChurnersCodeAlong.ipynb
Churn_Modelling.csv		Churn_Modelling.csv
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

lbventura/Churn-Prediction-Modeling

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages