-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable use of sample_weight
for model fitting, fix LGBM + categorical issue
#139
Conversation
- not clear if sample weights should be only for training, or also during evaluation - some may prefer it one way or the other.
- previously didn't use the pre-commit hook, hence code was black-formatted at wrong line-length - also added sample_weight docstring to fit_compute
- sample_weight optional in get_feature_shap_values_per_fold (failing test otherwise)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good already, great job! Couple of small comments only, we can merge it once you address and release it today or on monday in 1.8.1.
For tests I would say it is safe to set approximate to False, in the two failing tests. I think it should not have a lot of impact, only slightly slower computation.
As a very simple test you can edit one of the existing tests for ShapRFECV and pass sample_weights of all 1s. Later we can do a bit more complex tests.
Also if you want, the users would really benefit from having a HowTo guide on using ShapRFECV with sample weights. You can add it in docs/howto folder, and mkdocs.yml to the structure of the website. That one is optional.
@@ -77,7 +78,6 @@ def shap_calc( | |||
"data transformations before running the probatus module." | |||
) | |||
) | |||
|
|||
# Suppress warnings regarding XGboost and Lightgbm models. | |||
with warnings.catch_warnings(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line will suppress the warnings that you get later in the code.
It was used only for shap.Explainer line of code, because SHAP throws a lot of warnings. Maybe we can move it right above
if verbose > 0:
warnings.warn(
"Using tree_dependent feature_perturbation (in shap) without background"
" data for LGBM + categorical features."
)
explainer = shap.Explainer(model, **shap_kwargs)
else:
explainer = shap.Explainer(model, masker=mask, **shap_kwargs)
- catboost is now also checked for to not pass background data to shap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great contribution, thanks you!
I will release it now in 1.8.1
It is released in 1.8.1 |
See #106, #138.
I have left out the unit tests (for now). I'm not familiar with
pytest
syntax (over unittest which I normally use) and want to make sure that you're fine with the interfaces before testing them. I also noticed that several tests are failing due to usingapproximate=True
with LGBM models in shap. I don't know about that, so someone should have a look.Note: I also removed the training set from the
EarlyStopping
eval_set
argument - didn't mention it in any of the commit messages though.