-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#150 Return SHAP values with ShapRFECV fit_compute call #171
#150 Return SHAP values with ShapRFECV fit_compute call #171
Conversation
Return SHAP values with ShapRFECV fit_compute call. Add new method to get shap values of a given reduced features set. Update tutorial notebook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some suggestions
@@ -673,6 +685,37 @@ def get_reduced_features_set(self, num_features): | |||
else: | |||
return self.report_df[self.report_df.num_features == num_features]["features_set"].values[0] | |||
|
|||
def get_reduced_features_set_shap_values(self, num_features, abs_shap_values=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe good to rename it to get_feature_importance
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small unit test using very very simple handcrafted dataset, to get these shap values would be nice.
@@ -673,6 +685,37 @@ def get_reduced_features_set(self, num_features): | |||
else: | |||
return self.report_df[self.report_df.num_features == num_features]["features_set"].values[0] | |||
|
|||
def get_reduced_features_set_shap_values(self, num_features, abs_shap_values=True): | |||
""" | |||
Gets the shap values of a features set after the feature elimination process, for a given number of features. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get feature importance (mean abs shap values)
"execution_count": 6, | ||
"source": [ | ||
"#First 5 rows of first 5 columns\n", | ||
"report[['num_features', 'features_set', 'val_metric_mean']]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you can also add the mean abs shap values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As i am thinking about this now, let's keep this part as it is, but somewhere later in the notebook, add a cell showing feature importances for the final feature set :)
@@ -292,6 +292,8 @@ def _report_current_results( | |||
train_metric_std, | |||
val_metric_mean, | |||
val_metric_std, | |||
features_mean_abs_shap_value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a dictionary or a list?
If a list, please make sure that this corresponds well to the current feature list. the feature set might be sorted based on the shap values. Something to double check :)
Also please have a look what has changed in, just to make sure your PR is aligned #175 |
@eduardo-lourenco Are you going to still work on this one PR? |
hi @eduardo-lourenco thanks for your contributions, could you update your PR to be made compatible with the latest changes on main? |
This PR is closed due to inactivity. Please reach out if you want to continue with this PR. |
This PR targets issue #150 and comprises the following changes:
I will update the related tests if this solution seems appropriate (current tests are still passing, we just need to add new tests).