Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility early stopping catboost #175

Conversation

ClaudioSalvatoreArcidiacono
Copy link

@ClaudioSalvatoreArcidiacono ClaudioSalvatoreArcidiacono commented Nov 30, 2021

This PR addresses issue #146

  • adds compatibility for CatBoost models to EarlyStoppingShapRFECV

@operte
Copy link
Contributor

operte commented Nov 30, 2021

Looks very good to me. The refactoring of getting the fitting params makes things clearer as well.

A few comments:

  1. do you know why the LGBM tests are failing? I think they were working before. It seems they are failing for xgb and not for lgbm.
  2. could you specify on the init of earlystoppingshaprfecv that we just support xgb, catboost and lgbm?
  3. same comment as the previous, but for the [example notebook[(https://github.com/ing-bank/probatus/blob/main/docs/tutorials/nb_shap_feature_elimination.ipynb)
  4. should we somehow ask users to alert us if there's a new fancy model that demands support? we could have a new page in the discussion section

@ClaudioSalvatoreArcidiacono
Copy link
Author

ClaudioSalvatoreArcidiacono commented Dec 1, 2021

Thanks for your comments @operte, I have addressed the first 3 in my latest commits.

About the last comment, I am not entirely sure if the discussion section is the most appropriate place, right now it is very high level and this point is very specific to one class. Shall we maybe write something in the class docstring instead?

@operte
Copy link
Contributor

operte commented Dec 2, 2021

Good point @ClaudioSalvatoreArcidiacono , maybe a short comment in the class docstring is more appropriate!

What about the failing tests? Did you figure out why they suddenly started failing for lgbm and xgb? Do they fail on your local environment or also on github's pipelines?

Copy link
Contributor

@Matgrb Matgrb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes are looking great!
I left some comments, lef me know what do you think!

probatus/feature_elimination/feature_elimination.py Outdated Show resolved Hide resolved
setup.py Outdated
@@ -18,7 +18,9 @@ def read(fname):
"tqdm>=4.41.0",
"shap >= 0.38.1, < 0.39.0",# 0.40.0 causes issues in certain plots. For now it is excluded
"numpy>=1.19.0",
"lightgbm>=3.3.0"
"lightgbm>=3.3.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since now we don't import the lightgbm at the top of the file, the 3 packages can go to the extra dependencies. In case someone wants to use one of these models in shap RFECV the users will take care of installing them since they will need to pass them.
I would put these 3 into extra dependencies. What do you think?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To solve this issue I have taken a slightly unorthodox approach.

The base idea is that if a user is passing a model from Catboost, XGboost or LightGBM to RFECV, he/she already has this library installed in their environment. So I have thought to surround each import with a try except statement and to ignore import errors. Please let me know if you are ok with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not unorthodox at all actually :) You can take this a step further and replace the import with a 'mock' instance where .__call__ will invoke a helpful 'not installed' error only when the import is actually used. probatus already supports this, you can use utils.exceptions.NotInstalledError. See

class NotInstalledError:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @timvink, thanks for joining this review :).

Sorry, I did not understand what do you mean when you say

... replace the import with a 'mock' instance ...

The intention of my code is:

I am a user of Probatus that likes to use LightGBM.
I already have a virtual environment where Probatus and LightGBM are installed.
I use the class RFECV of Probatus passing the LightGBM classifier instance I have created earlier in my code.
Probatus tries to import cat boosts, but it fails because the user has installed LightGBM - this step is ignored.
Probatus tries to import LightGBM and it succeeds.
Probatus Cretes fit params compatible with LightGBM.

Now Imagine the user wants to pass something else to RFECV, like a RandomForestClassifier from sklearn. All of the imports will fail and the user will receive a model not supported error.

It seems like a legit behaviour to me.

What would you suggest instead :) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, still keeping an eye on the probatus development :)

I didn't look in detail enough, I thought you had a top level try: import .. except: pass and then NotInstalledError can help with a 'mock' to create a more helpfull error message.

I looked at your implementation of def _get_fit_params and I think it's quite elegant actually!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks :), happy to hear that you are still hanging around ;)

@@ -322,8 +323,47 @@ def test_shap_rfe_early_stopping(complex_data, capsys):
assert len(out) == 0


def test_shap_rfe_early_stopping_CatBoost(complex_data, capsys):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add one test for XGBoost, just to make sure it works as well ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, something a bit weird happened tho.

In the last test XGBoost gets a test score of 0.3333.

.gitignore Show resolved Hide resolved
@ClaudioSalvatoreArcidiacono
Copy link
Author

Mi @Matgrb, thanks for your comments. I will try to address them in the next commits. Do you have any idea why the tests on Mac and windows are failing? looks like something related to libomp. Do you know if it is it installed during the pipeline?

@ClaudioSalvatoreArcidiacono
Copy link
Author

Hi @Matgrb, can we rerun the workflows? I think that they failed for a temporary issue.

Rather than that, all of the comments should have been now solved :).

Let me know if you have other comments.

@ClaudioSalvatoreArcidiacono
Copy link
Author

Thanks for re-triggering the workflow. It is still failing.

On the positive side, I managed to reproduce the error in my local. I am going to investigate about it and keep you updated.

ValueError: I/O operation on closed file
@ClaudioSalvatoreArcidiacono
Copy link
Author

The issue with the tests should have been now fixed.

Apparently if we import CatBoostClassifier multiple times (for example, we import it in more than 1 test) something weird happens. This is probably due to the initialisation of catboost.

The solution for me was to import the CatBoostClassifier class in a session scoped fixture.

@timvink
Copy link
Contributor

timvink commented Dec 8, 2021

That's a nasty error to debug, nice find. Could that happen to users as well? If so, perhaps a more robust solution would be to do something like:

import sys
if "catboost" not in sys.modules:
  import catboost

Copy link
Contributor

@Matgrb Matgrb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the effort, the changes look great!
I will release a new version of probatus to pypi today!

@timvink
Copy link
Contributor

timvink commented Dec 8, 2021

: 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants