Perform classification model in Credit Card Fraud dataset on Kaggle, link https://www.kaggle.com/mlg-ulb/creditcardfraud
The model includes the following steps:
- Using undersampling and oversampling technique to learn from the unbalance dataset (fraud transactions contribute 0.17%)
- Instead of score, recall is the key metric used to validate the model.
- Conclusion: Applying Logistic regression in Oversampling dataset gives the best recall score (92%) and ROC curve.