GitHub - jvanelteren/housing: Top 3% in Kaggle housing competition

My submission to kaggle housing competition

Final score was 0.11717, which is a top 3% submission. See a summary I posted after the competition

Datasets are original datasets Output are outputs, mainly train/test dataframes and models

The numbered notebooks are the core steps in the pipeline. 1,2 and 4 are essential.

EDA
Feature Engineering, coming up with additional features
Feature Selection, trimming the amount of features down somewhat with Boruta algo
Model Selection, trying out different models, see what works best
Model Interpretation, with the awesome shap package. One workbook for a catboost model, one for a lasso model. Neither is superior
Tried to apply fastai deeplearning to the dataset, didn't give a good result

Thanks to Kaggle for making such a great platform for competitive coding

You can use this under the MIT license

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
catboost_info		catboost_info
datasets		datasets
output		output
utils		utils
1Exploratory Data Analysis.ipynb		1Exploratory Data Analysis.ipynb
2Feature Engineering.ipynb		2Feature Engineering.ipynb
3Feature Selection - Boruta.ipynb		3Feature Selection - Boruta.ipynb
4Model Selection.ipynb		4Model Selection.ipynb
5Model Interpretation - Catboost.ipynb		5Model Interpretation - Catboost.ipynb
5Model Interpretation - Lasso.ipynb		5Model Interpretation - Lasso.ipynb
9Categorical_embeddings didnt work.ipynb		9Categorical_embeddings didnt work.ipynb
9Fastai model didnt work.ipynb		9Fastai model didnt work.ipynb
discussion.md		discussion.md
readme.md		readme.md
some_test_results.xlsx		some_test_results.xlsx