Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catboost + categorical features broken inside ShapRFECV #147

Closed
oleg-savko opened this issue May 4, 2021 · 8 comments · Fixed by #148
Closed

Catboost + categorical features broken inside ShapRFECV #147

oleg-savko opened this issue May 4, 2021 · 8 comments · Fixed by #148
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@oleg-savko
Copy link

Its reference the same as bug: #138

But for catboost this part of code not work:

model_str = str(model) if not isinstance(model, Pipeline) else str(model[-1])
if (X.select_dtypes("category").shape[1] > 0) and model_str.startswith(("LGBM", "CatBoost")):
    if verbose > 0:
        warnings.warn(
            "Using tree_dependent feature_perturbation (in shap) without background"
            " data for LGBM + categorical features."
        )
    explainer = shap.Explainer(model, **shap_kwargs)
else:
    explainer = shap.Explainer(model, masker=mask, **shap_kwargs)

Because for catboost str return in another format and not detect:

print(str(CatBoostClassifier(verbose=25, early_stopping_rounds=50,
                               scale_pos_weight=scale_pos_weight,
                               one_hot_max_size=50,
                               iterations=3000, learning_rate=0.03,
                               random_seed=42)))

<catboost.core.CatBoostClassifier object at 0x7fb2f164ec10>

So need very simple fix to correct catboost model detection.

@oleg-savko oleg-savko added the bug Something isn't working label May 4, 2021
@Matgrb Matgrb added the good first issue Good for newcomers label May 4, 2021
@Matgrb
Copy link
Contributor

Matgrb commented May 4, 2021

Great find.
I think we can completely skip this part and model_str.startswith(("LGBM", "CatBoost"), since for any tree based model that supports categoricals we don't want to pass mask. @timlod what do you think?

Feel free to pick up this issue.

@cerlymarco
Copy link

CatBoost has its own ShapRFE: https://catboost.ai/docs/concepts/python-reference_catboost_select_features.html

@timlod
Copy link
Contributor

timlod commented May 4, 2021

@Matgrb I agree, however we need to be aware that it will fail for users who don't know whether their algorithm supports Categorical or not. This means that if someone provides Categorical we assume that the model must accept it. That's a fine strategy, but may yield confusing results for uninformed users.

@Matgrb
Copy link
Contributor

Matgrb commented May 4, 2021

If the model does not accept categorical, this the method will already fail when the model is being fitted, before getting the shap values.

@timlod
Copy link
Contributor

timlod commented May 4, 2021

In that case, given that only tree-based models currently accept Categorial, and they require no background data to be passed, it indeed only makes sense to remove the model check there! I can submit a PR, but it's probably easier if you just push a quick commit removing the model check.

@Matgrb
Copy link
Contributor

Matgrb commented May 4, 2021

I made a PR, once we merge it i will release a new version of probatus.

@Matgrb
Copy link
Contributor

Matgrb commented May 4, 2021

It is now released in probatus 1.8.2 . Can you confirm if it works for you @oleg-savko ?

@oleg-savko
Copy link
Author

It is now released in probatus 1.8.2 . Can you confirm if it works for you @oleg-savko ?

Checked it, seems like version probatus==1.8.2 work fine now :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants