Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable use of sample_weight for model fitting, fix LGBM + categorical issue #139

Merged
merged 7 commits into from
Apr 18, 2021

Conversation

timlod
Copy link
Contributor

@timlod timlod commented Apr 16, 2021

See #106, #138.

I have left out the unit tests (for now). I'm not familiar with pytest syntax (over unittest which I normally use) and want to make sure that you're fine with the interfaces before testing them. I also noticed that several tests are failing due to using approximate=True with LGBM models in shap. I don't know about that, so someone should have a look.

Note: I also removed the training set from the EarlyStopping eval_set argument - didn't mention it in any of the commit messages though.

timlod added 5 commits April 15, 2021 12:54
- not clear if sample weights should be only for training, or also
during evaluation - some may prefer it one way or the other.
- previously didn't use the pre-commit hook, hence code was
black-formatted at wrong line-length
- also added sample_weight docstring to fit_compute
- sample_weight optional in get_feature_shap_values_per_fold (failing
test otherwise)
Copy link
Contributor

@Matgrb Matgrb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good already, great job! Couple of small comments only, we can merge it once you address and release it today or on monday in 1.8.1.

For tests I would say it is safe to set approximate to False, in the two failing tests. I think it should not have a lot of impact, only slightly slower computation.

As a very simple test you can edit one of the existing tests for ShapRFECV and pass sample_weights of all 1s. Later we can do a bit more complex tests.

Also if you want, the users would really benefit from having a HowTo guide on using ShapRFECV with sample weights. You can add it in docs/howto folder, and mkdocs.yml to the structure of the website. That one is optional.

probatus/feature_elimination/feature_elimination.py Outdated Show resolved Hide resolved
probatus/feature_elimination/feature_elimination.py Outdated Show resolved Hide resolved
probatus/utils/shap_helpers.py Outdated Show resolved Hide resolved
@@ -77,7 +78,6 @@ def shap_calc(
"data transformations before running the probatus module."
)
)

# Suppress warnings regarding XGboost and Lightgbm models.
with warnings.catch_warnings():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line will suppress the warnings that you get later in the code.

It was used only for shap.Explainer line of code, because SHAP throws a lot of warnings. Maybe we can move it right above

            if verbose > 0:
                warnings.warn(
                    "Using tree_dependent feature_perturbation (in shap) without background"
                    " data for LGBM + categorical features."
                )
            explainer = shap.Explainer(model, **shap_kwargs)
        else:
            explainer = shap.Explainer(model, masker=mask, **shap_kwargs)

- catboost is now also checked for to not pass background data to shap
Copy link
Contributor

@Matgrb Matgrb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great contribution, thanks you!

I will release it now in 1.8.1

@Matgrb
Copy link
Contributor

Matgrb commented Apr 18, 2021

It is released in 1.8.1

https://github.com/ing-bank/probatus/releases/tag/v1.8.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants