Skip to content

Latest commit

 

History

History
54 lines (46 loc) · 4.23 KB

README.md

File metadata and controls

54 lines (46 loc) · 4.23 KB

modelling_week_2019

Credit Card Fraud Detection problem for the XIII Modelling Week, held in the Faculty of Mathematics of the Universidad Complutense de Madrid (UCM), during 10-14 June 2019. The Modelling Week is open to the students of the Master in Mathematical Engineering at UCM, as well as to participants from other mathematically oriented master programs worldwide. The purpose is to teach and guide the students to solve a realistic industry problem.

The problem can be approached in three ways: supervised, unsupervised and mixed. We are going to start using a supervised approach, since it is simpler. If time permits, we'll explore unsupervised methods (a really interesting field).

Python libraries

jupyter,pandas,matplotlib,seaborn,sklearn,tensorflow,keras,imblearn,xgboost

Outline

  • Basic programming with python and jupyter
  • Exploratory data analysis, cleaning and preprocessing. Feature engineering.
  • Overfitting. Validation scheme. Difference between train, validation and test sets.
  • Metrics: precision, recall, ROC curve, AUC (ROC), F1, confusion matrix. Focus on unbalanced datasets.
  • Classification algorithms in sklearn. Comments on hyperparameter tuning.
  • xgboost in Python using xgboost.sklearn API.
  • Combination of models. Calibration. Ensembling and Stacking.
  • Neural Networks in keras:
    • Feed Forward Neural Network for classification.
    • Autoencoder as an anomaly detector (semi and unsupervised)
    • Autoencoder as a feature builder (unsupervised)
  • Combination of unsupervised and supervised methods.

Cheatsheets

Resources

Bibliography

  • Leo Breiman "Statistical Modeling: The Two Cultures" (2001) (Breiman)
  • Elements of Statistical Learning (ESL)
  • Introduction to Statistical Learning with R (ISLR)
  • Pattern Recognition and Machine Learning (Bishop)
  • Bayesian Data Analysis (BDA)