My submission to kaggle housing competition
Final score was 0.11717, which is a top 3% submission. See a summary I posted after the competition
Datasets are original datasets Output are outputs, mainly train/test dataframes and models
The numbered notebooks are the core steps in the pipeline. 1,2 and 4 are essential.
- EDA
- Feature Engineering, coming up with additional features
- Feature Selection, trimming the amount of features down somewhat with Boruta algo
- Model Selection, trying out different models, see what works best
- Model Interpretation, with the awesome shap package. One workbook for a catboost model, one for a lasso model. Neither is superior
- Tried to apply fastai deeplearning to the dataset, didn't give a good result
Thanks to Kaggle for making such a great platform for competitive coding