-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LightGBM + categorical features broken inside ShapRFECV #138
Comments
Great finding, I think this would be the way to go. probatus by default transforms dataset to Df, and all categorical features have "category" dtype. Indeed not passing the mask if categorical features would be nice. I am curious whether this applies only to tree-based models, are there any linear ones that support categorical features? Feel free to pick this issue up! 👍 |
I'll submit a PR later today implementing both this and #106! |
Good one! I think this will also solve the problems we were having with testing dummy data with a categorical column, won't it @Matgrb? |
Covered in #139 |
Describe the bug
When using LightGBM on a dataset including pd.Categorical features, the shap Explainer will fail, advising you to use
feature_perturbation="tree_path_dependent"
. However, since we're using LightGBM, the algorithm will already choose this by default - the real issue is that background data is passed, which isn't supported together withfeature_perturbation="tree_path_dependent"
. BG data is passed asmask
inshap_calc()
.Environment (please complete the following information):
To Reproduce
Use LightGBM in ShapRFECV on a dataset with categorical features.
Error traceback
Can't provide right now as I've already fixed this on my branch, but the error will be:
from inside
shap
_tree.py
, as called insideshap_calc()
.Expected behavior
It runs without issue, as there is support for trees with categorical features in
shap
.Proposed fix
Check model type and X features inside
shap_calc
, and avoid passingmask
if there are categorical features and the model is tree-based.The text was updated successfully, but these errors were encountered: