-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compatibility early stopping catboost #175
Compatibility early stopping catboost #175
Conversation
Looks very good to me. The refactoring of getting the fitting params makes things clearer as well. A few comments:
|
Thanks for your comments @operte, I have addressed the first 3 in my latest commits. About the last comment, I am not entirely sure if the discussion section is the most appropriate place, right now it is very high level and this point is very specific to one class. Shall we maybe write something in the class docstring instead? |
Good point @ClaudioSalvatoreArcidiacono , maybe a short comment in the class docstring is more appropriate! What about the failing tests? Did you figure out why they suddenly started failing for lgbm and xgb? Do they fail on your local environment or also on github's pipelines? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes are looking great!
I left some comments, lef me know what do you think!
setup.py
Outdated
@@ -18,7 +18,9 @@ def read(fname): | |||
"tqdm>=4.41.0", | |||
"shap >= 0.38.1, < 0.39.0",# 0.40.0 causes issues in certain plots. For now it is excluded | |||
"numpy>=1.19.0", | |||
"lightgbm>=3.3.0" | |||
"lightgbm>=3.3.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since now we don't import the lightgbm at the top of the file, the 3 packages can go to the extra dependencies. In case someone wants to use one of these models in shap RFECV the users will take care of installing them since they will need to pass them.
I would put these 3 into extra dependencies. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To solve this issue I have taken a slightly unorthodox approach.
The base idea is that if a user is passing a model from Catboost, XGboost or LightGBM to RFECV, he/she already has this library installed in their environment. So I have thought to surround each import with a try except statement and to ignore import errors. Please let me know if you are ok with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not unorthodox at all actually :) You can take this a step further and replace the import with a 'mock' instance where .__call__
will invoke a helpful 'not installed' error only when the import is actually used. probatus
already supports this, you can use utils.exceptions.NotInstalledError
. See
probatus/probatus/utils/exceptions.py
Line 57 in 5c5b0aa
class NotInstalledError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @timvink, thanks for joining this review :).
Sorry, I did not understand what do you mean when you say
... replace the import with a 'mock' instance ...
The intention of my code is:
I am a user of Probatus that likes to use LightGBM.
I already have a virtual environment where Probatus and LightGBM are installed.
I use the class RFECV of Probatus passing the LightGBM classifier instance I have created earlier in my code.
Probatus tries to import cat boosts, but it fails because the user has installed LightGBM - this step is ignored.
Probatus tries to import LightGBM and it succeeds.
Probatus Cretes fit params compatible with LightGBM.
Now Imagine the user wants to pass something else to RFECV, like a RandomForestClassifier from sklearn. All of the imports will fail and the user will receive a model not supported error.
It seems like a legit behaviour to me.
What would you suggest instead :) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, still keeping an eye on the probatus development :)
I didn't look in detail enough, I thought you had a top level try: import .. except: pass
and then NotInstalledError
can help with a 'mock' to create a more helpfull error message.
I looked at your implementation of def _get_fit_params
and I think it's quite elegant actually!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks :), happy to hear that you are still hanging around ;)
@@ -322,8 +323,47 @@ def test_shap_rfe_early_stopping(complex_data, capsys): | |||
assert len(out) == 0 | |||
|
|||
|
|||
def test_shap_rfe_early_stopping_CatBoost(complex_data, capsys): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add one test for XGBoost, just to make sure it works as well ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added, something a bit weird happened tho.
In the last test XGBoost gets a test score of 0.3333.
Mi @Matgrb, thanks for your comments. I will try to address them in the next commits. Do you have any idea why the tests on Mac and windows are failing? looks like something related to |
Hi @Matgrb, can we rerun the workflows? I think that they failed for a temporary issue. Rather than that, all of the comments should have been now solved :). Let me know if you have other comments. |
Thanks for re-triggering the workflow. It is still failing. On the positive side, I managed to reproduce the error in my local. I am going to investigate about it and keep you updated. |
ValueError: I/O operation on closed file
The issue with the tests should have been now fixed. Apparently if we import CatBoostClassifier multiple times (for example, we import it in more than 1 test) something weird happens. This is probably due to the initialisation of catboost. The solution for me was to import the CatBoostClassifier class in a session scoped fixture. |
That's a nasty error to debug, nice find. Could that happen to users as well? If so, perhaps a more robust solution would be to do something like: import sys
if "catboost" not in sys.modules:
import catboost |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the effort, the changes look great!
I will release a new version of probatus to pypi today!
: 🎉 |
This PR addresses issue #146