Enable use of `sample_weight` for model fitting, fix LGBM + categorical issue #139

timlod · 2021-04-16T11:12:47Z

I have left out the unit tests (for now). I'm not familiar with pytest syntax (over unittest which I normally use) and want to make sure that you're fine with the interfaces before testing them. I also noticed that several tests are failing due to using approximate=True with LGBM models in shap. I don't know about that, so someone should have a look.

Note: I also removed the training set from the EarlyStopping eval_set argument - didn't mention it in any of the commit messages though.

- not clear if sample weights should be only for training, or also during evaluation - some may prefer it one way or the other.

- previously didn't use the pre-commit hook, hence code was black-formatted at wrong line-length - also added sample_weight docstring to fit_compute

- sample_weight optional in get_feature_shap_values_per_fold (failing test otherwise)

Matgrb

Looks very good already, great job! Couple of small comments only, we can merge it once you address and release it today or on monday in 1.8.1.

For tests I would say it is safe to set approximate to False, in the two failing tests. I think it should not have a lot of impact, only slightly slower computation.

As a very simple test you can edit one of the existing tests for ShapRFECV and pass sample_weights of all 1s. Later we can do a bit more complex tests.

Also if you want, the users would really benefit from having a HowTo guide on using ShapRFECV with sample weights. You can add it in docs/howto folder, and mkdocs.yml to the structure of the website. That one is optional.

probatus/feature_elimination/feature_elimination.py

probatus/utils/shap_helpers.py

Matgrb · 2021-04-16T12:55:26Z

probatus/utils/shap_helpers.py

@@ -77,7 +78,6 @@ def shap_calc(
                "data transformations before running the probatus module."
            )
        )
-
    # Suppress warnings regarding XGboost and Lightgbm models.
    with warnings.catch_warnings():


I think this line will suppress the warnings that you get later in the code.

It was used only for shap.Explainer line of code, because SHAP throws a lot of warnings. Maybe we can move it right above

if verbose > 0: warnings.warn( "Using tree_dependent feature_perturbation (in shap) without background" " data for LGBM + categorical features." ) explainer = shap.Explainer(model, **shap_kwargs) else: explainer = shap.Explainer(model, masker=mask, **shap_kwargs)

- catboost is now also checked for to not pass background data to shap

tests/feature_elimination/test_feature_elimination.py

Matgrb

Great contribution, thanks you!

I will release it now in 1.8.1

Matgrb · 2021-04-18T18:28:07Z

It is released in 1.8.1

https://github.com/ing-bank/probatus/releases/tag/v1.8.1

timlod added 5 commits April 15, 2021 12:54

Add sample_weight to model fitting (not evaluation)

d00890e

- not clear if sample weights should be only for training, or also during evaluation - some may prefer it one way or the other.

Merge branch 'main' of https://github.com/ing-bank/probatus into main

e743313

Fix categorical/LGBM error, undo bad formatting

4465601

- previously didn't use the pre-commit hook, hence code was black-formatted at wrong line-length - also added sample_weight docstring to fit_compute

Merge branch 'main' of https://github.com/ing-bank/probatus into main

323c66d

Remove prints and make sample_weight optional

4abb989

- sample_weight optional in get_feature_shap_values_per_fold (failing test otherwise)

Matgrb suggested changes Apr 16, 2021

View reviewed changes

Add simple test, add docstrings & catboost fix

a847e3a

- catboost is now also checked for to not pass background data to shap

Matgrb reviewed Apr 18, 2021

View reviewed changes

tests/feature_elimination/test_feature_elimination.py Show resolved Hide resolved

Fix test to use approximate=False with LGBM

05f4a66

Matgrb approved these changes Apr 18, 2021

View reviewed changes

Matgrb merged commit b03bb44 into ing-bank:main Apr 18, 2021

Matgrb mentioned this pull request Apr 18, 2021

LightGBM + categorical features broken inside ShapRFECV #138

Closed

timlod mentioned this pull request Apr 19, 2021

Allow to pass sample_weight to estimators in ShapRFECV #106

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable use of `sample_weight` for model fitting, fix LGBM + categorical issue #139

Enable use of `sample_weight` for model fitting, fix LGBM + categorical issue #139

timlod commented Apr 16, 2021 •

edited

Loading

Matgrb left a comment

Matgrb Apr 16, 2021

Matgrb left a comment

Matgrb commented Apr 18, 2021

Enable use of sample_weight for model fitting, fix LGBM + categorical issue #139

Enable use of sample_weight for model fitting, fix LGBM + categorical issue #139

Conversation

timlod commented Apr 16, 2021 • edited Loading

Matgrb left a comment

Choose a reason for hiding this comment

Matgrb Apr 16, 2021

Choose a reason for hiding this comment

Matgrb left a comment

Choose a reason for hiding this comment

Matgrb commented Apr 18, 2021

Enable use of `sample_weight` for model fitting, fix LGBM + categorical issue #139

Enable use of `sample_weight` for model fitting, fix LGBM + categorical issue #139

timlod commented Apr 16, 2021 •

edited

Loading