Skip to content

Commit

Permalink
Set OneHotEncoder's handle_unknown='ignore' to avoid warnings
Browse files Browse the repository at this point in the history
  • Loading branch information
ageron committed Oct 11, 2021
1 parent e157a16 commit 7499570
Showing 1 changed file with 9 additions and 0 deletions.
9 changes: 9 additions & 0 deletions 02_end_to_end_machine_learning_project.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6035,6 +6035,13 @@
"**Warning**: the following cell may take close to 45 minutes to run, or more depending on your hardware."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note:** In the code below, I've set the `OneHotEncoder`'s `handle_unknown` hyperparameter to `'ignore'`, to avoid warnings during training. Without this, the `OneHotEncoder` would default to `handle_unknown='error'`, meaning that it would raise an error when transforming any data containing a category it didn't see during training. If we kept the default, then the `GridSearchCV` would run into errors during training when evaluating the folds in which not all the categories are in the training set. This is likely to happen since there's only one sample in the `'ISLAND'` category, and it may end up in the test set in some of the folds. So some folds would just be dropped by the `GridSearchCV`, and it's best to avoid that."
]
},
{
"cell_type": "code",
"execution_count": 137,
Expand Down Expand Up @@ -6187,6 +6194,8 @@
}
],
"source": [
"full_pipeline.named_transformers_[\"cat\"].handle_unknown = 'ignore'\n",
"\n",
"param_grid = [{\n",
" 'preparation__num__imputer__strategy': ['mean', 'median', 'most_frequent'],\n",
" 'feature_selection__k': list(range(1, len(feature_importances) + 1))\n",
Expand Down

0 comments on commit 7499570

Please sign in to comment.