Compatibility early stopping catboost #175

ClaudioSalvatoreArcidiacono · 2021-11-30T16:16:59Z

This PR addresses issue #146

adds compatibility for CatBoost models to EarlyStoppingShapRFECV

operte · 2021-11-30T16:54:55Z

Looks very good to me. The refactoring of getting the fitting params makes things clearer as well.

A few comments:

do you know why the LGBM tests are failing? I think they were working before. It seems they are failing for xgb and not for lgbm.
could you specify on the init of earlystoppingshaprfecv that we just support xgb, catboost and lgbm?
same comment as the previous, but for the [example notebook[(https://github.com/ing-bank/probatus/blob/main/docs/tutorials/nb_shap_feature_elimination.ipynb)
should we somehow ask users to alert us if there's a new fancy model that demands support? we could have a new page in the discussion section

ClaudioSalvatoreArcidiacono · 2021-12-01T10:55:36Z

Thanks for your comments @operte, I have addressed the first 3 in my latest commits.

About the last comment, I am not entirely sure if the discussion section is the most appropriate place, right now it is very high level and this point is very specific to one class. Shall we maybe write something in the class docstring instead?

operte · 2021-12-02T16:19:29Z

Good point @ClaudioSalvatoreArcidiacono , maybe a short comment in the class docstring is more appropriate!

What about the failing tests? Did you figure out why they suddenly started failing for lgbm and xgb? Do they fail on your local environment or also on github's pipelines?

Matgrb

Changes are looking great!
I left some comments, lef me know what do you think!

probatus/feature_elimination/feature_elimination.py

Matgrb · 2021-12-03T20:07:36Z

setup.py

@@ -18,7 +18,9 @@ def read(fname):
    "tqdm>=4.41.0",
    "shap >= 0.38.1, < 0.39.0",# 0.40.0 causes issues in certain plots. For now it is excluded
    "numpy>=1.19.0",
-    "lightgbm>=3.3.0"
+    "lightgbm>=3.3.0",


Since now we don't import the lightgbm at the top of the file, the 3 packages can go to the extra dependencies. In case someone wants to use one of these models in shap RFECV the users will take care of installing them since they will need to pass them.
I would put these 3 into extra dependencies. What do you think?

To solve this issue I have taken a slightly unorthodox approach.

The base idea is that if a user is passing a model from Catboost, XGboost or LightGBM to RFECV, he/she already has this library installed in their environment. So I have thought to surround each import with a try except statement and to ignore import errors. Please let me know if you are ok with it.

Not unorthodox at all actually :) You can take this a step further and replace the import with a 'mock' instance where .__call__ will invoke a helpful 'not installed' error only when the import is actually used. probatus already supports this, you can use utils.exceptions.NotInstalledError. See

probatus/probatus/utils/exceptions.py

Line 57 in 5c5b0aa

class NotInstalledError:

Hi @timvink, thanks for joining this review :).

Sorry, I did not understand what do you mean when you say

... replace the import with a 'mock' instance ...

The intention of my code is:

I am a user of Probatus that likes to use LightGBM.
I already have a virtual environment where Probatus and LightGBM are installed.
I use the class RFECV of Probatus passing the LightGBM classifier instance I have created earlier in my code.
Probatus tries to import cat boosts, but it fails because the user has installed LightGBM - this step is ignored.
Probatus tries to import LightGBM and it succeeds.
Probatus Cretes fit params compatible with LightGBM.

Now Imagine the user wants to pass something else to RFECV, like a RandomForestClassifier from sklearn. All of the imports will fail and the user will receive a model not supported error.

It seems like a legit behaviour to me.

What would you suggest instead :) ?

No problem, still keeping an eye on the probatus development :)

I didn't look in detail enough, I thought you had a top level try: import .. except: pass and then NotInstalledError can help with a 'mock' to create a more helpfull error message.

I looked at your implementation of def _get_fit_params and I think it's quite elegant actually!

Thanks :), happy to hear that you are still hanging around ;)

tests/feature_elimination/test_feature_elimination.py

Matgrb · 2021-12-03T20:10:38Z

tests/feature_elimination/test_feature_elimination.py

@@ -322,8 +323,47 @@ def test_shap_rfe_early_stopping(complex_data, capsys):
    assert len(out) == 0


+def test_shap_rfe_early_stopping_CatBoost(complex_data, capsys):


Can you also add one test for XGBoost, just to make sure it works as well ?

Added, something a bit weird happened tho.

In the last test XGBoost gets a test score of 0.3333.

.gitignore

ClaudioSalvatoreArcidiacono · 2021-12-06T17:01:59Z

Mi @Matgrb, thanks for your comments. I will try to address them in the next commits. Do you have any idea why the tests on Mac and windows are failing? looks like something related to libomp. Do you know if it is it installed during the pipeline?

ClaudioSalvatoreArcidiacono · 2021-12-07T16:58:14Z

Hi @Matgrb, can we rerun the workflows? I think that they failed for a temporary issue.

Rather than that, all of the comments should have been now solved :).

Let me know if you have other comments.

ClaudioSalvatoreArcidiacono · 2021-12-08T08:29:09Z

Thanks for re-triggering the workflow. It is still failing.

On the positive side, I managed to reproduce the error in my local. I am going to investigate about it and keep you updated.

ValueError: I/O operation on closed file

ClaudioSalvatoreArcidiacono · 2021-12-08T15:20:21Z

The issue with the tests should have been now fixed.

Apparently if we import CatBoostClassifier multiple times (for example, we import it in more than 1 test) something weird happens. This is probably due to the initialisation of catboost.

The solution for me was to import the CatBoostClassifier class in a session scoped fixture.

timvink · 2021-12-08T15:47:39Z

That's a nasty error to debug, nice find. Could that happen to users as well? If so, perhaps a more robust solution would be to do something like:

import sys
if "catboost" not in sys.modules:
  import catboost

Matgrb

Thank you for the effort, the changes look great!
I will release a new version of probatus to pypi today!

timvink · 2021-12-08T17:44:31Z

: 🎉

claudio.arcidiacono added 3 commits November 30, 2021 14:26

First Draft Implementation, regression tests ok

6664be3

add tests

b9429b5

Add Docstring

a6dbff9

ClaudioSalvatoreArcidiacono mentioned this pull request Nov 30, 2021

Add EarlyStoppingShapRFECV to support catboost model #146

Closed

claudio.arcidiacono added 3 commits December 1, 2021 11:36

moving xgboost imports to inner func

5816f23

Document compatibility in docstring E.S.ShapRFECV

2751d7e

Document supported models in tutorial notebook

ab3732d

Matgrb suggested changes Dec 3, 2021

View reviewed changes

Move Catboost imports to inner function

cdbd877

claudio.arcidiacono added 2 commits December 6, 2021 18:28

Move dependencies to extra deps

c7b8409

Add tests for XGboost

be70cde

Bug fix in tests

3df3fbc

ValueError: I/O operation on closed file

Matgrb approved these changes Dec 8, 2021

View reviewed changes

Matgrb merged commit 360e724 into ing-bank:main Dec 8, 2021

This was referenced Dec 8, 2021

#31 ShapRFECV implement best features set #173

Closed

#150 Return SHAP values with ShapRFECV fit_compute call #171

Closed

Multi class shap rfecv #174

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compatibility early stopping catboost #175

Compatibility early stopping catboost #175

ClaudioSalvatoreArcidiacono commented Nov 30, 2021 •

edited

Loading

operte commented Nov 30, 2021 •

edited

Loading

ClaudioSalvatoreArcidiacono commented Dec 1, 2021 •

edited

Loading

operte commented Dec 2, 2021 •

edited

Loading

Matgrb left a comment

Matgrb Dec 3, 2021

ClaudioSalvatoreArcidiacono Dec 6, 2021

timvink Dec 6, 2021

ClaudioSalvatoreArcidiacono Dec 7, 2021

timvink Dec 7, 2021

ClaudioSalvatoreArcidiacono Dec 7, 2021

Matgrb Dec 3, 2021

ClaudioSalvatoreArcidiacono Dec 7, 2021

ClaudioSalvatoreArcidiacono commented Dec 6, 2021

ClaudioSalvatoreArcidiacono commented Dec 7, 2021

ClaudioSalvatoreArcidiacono commented Dec 8, 2021

ClaudioSalvatoreArcidiacono commented Dec 8, 2021

timvink commented Dec 8, 2021 •

edited

Loading

Matgrb left a comment

timvink commented Dec 8, 2021

		@@ -322,8 +323,47 @@ def test_shap_rfe_early_stopping(complex_data, capsys):
		assert len(out) == 0


		def test_shap_rfe_early_stopping_CatBoost(complex_data, capsys):

Compatibility early stopping catboost #175

Compatibility early stopping catboost #175

Conversation

ClaudioSalvatoreArcidiacono commented Nov 30, 2021 • edited Loading

operte commented Nov 30, 2021 • edited Loading

ClaudioSalvatoreArcidiacono commented Dec 1, 2021 • edited Loading

operte commented Dec 2, 2021 • edited Loading

Matgrb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ClaudioSalvatoreArcidiacono commented Dec 6, 2021

ClaudioSalvatoreArcidiacono commented Dec 7, 2021

ClaudioSalvatoreArcidiacono commented Dec 8, 2021

ClaudioSalvatoreArcidiacono commented Dec 8, 2021

timvink commented Dec 8, 2021 • edited Loading

Matgrb left a comment

Choose a reason for hiding this comment

timvink commented Dec 8, 2021

ClaudioSalvatoreArcidiacono commented Nov 30, 2021 •

edited

Loading

operte commented Nov 30, 2021 •

edited

Loading

ClaudioSalvatoreArcidiacono commented Dec 1, 2021 •

edited

Loading

operte commented Dec 2, 2021 •

edited

Loading

timvink commented Dec 8, 2021 •

edited

Loading