Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Probatus to use the latest version of SHAP #225

Closed
ReinierKoops opened this issue Jul 27, 2023 · 23 comments
Closed

Update Probatus to use the latest version of SHAP #225

ReinierKoops opened this issue Jul 27, 2023 · 23 comments
Labels
enhancement New feature or request

Comments

@ReinierKoops
Copy link

ReinierKoops commented Jul 27, 2023

Right now we are using an outdated version of SHAP. Given the sudden release of some updates a few weeks ago for SHAP, it is now possible to fix issues with SHAP by updating it in Probatus:
Screenshot 2023-07-27 at 16 58 51

@ReinierKoops ReinierKoops added the enhancement New feature or request label Jul 27, 2023
@detrin
Copy link
Contributor

detrin commented Aug 19, 2023

I am trying to solve this one for fun. Seems it won't be so simple. I am testing it with python3.11 and I am getting errors with tests

$ pytest -sv

======================================================================= FAILURES ========================================================================
_______________________________________________________ test_shap_automatic_num_feature_selection _______________________________________________________

    def test_shap_automatic_num_feature_selection():
        """
        Test automatic num feature selection methods
        """
        X = pd.DataFrame(
            {
                "col_1": [1, 0, 1, 0, 1, 0, 1, 0],
                "col_2": [0, 0, 0, 0, 0, 1, 1, 1],
                "col_3": [1, 1, 1, 0, 0, 0, 0, 0],
            }
        )
        y = pd.Series([0, 0, 0, 0, 1, 1, 1, 1])
    
        clf = DecisionTreeClassifier(max_depth=1, random_state=1)
        shap_elimination = ShapRFECV(
            clf,
            random_state=1,
            step=1,
            cv=2,
            scoring="roc_auc",
            n_jobs=1,
        )
        _ = shap_elimination.fit_compute(X, y, approximate=True, check_additivity=False)
    
        best_features = shap_elimination.get_reduced_features_set(num_features="best")
        best_coherent_features = shap_elimination.get_reduced_features_set(
            num_features="best_coherent",
        )
        best_parsimonious_features = shap_elimination.get_reduced_features_set(num_features="best_parsimonious")
    
>       assert best_features == ["col_3"]
E       AssertionError: assert array(['col_2'], dtype=object) == ['col_3']
E         Full diff:
E         - ['col_3']
E         + array(['col_2'], dtype=object)

tests/feature_elimination/test_feature_elimination.py:328: AssertionError
__________________________________________________ test_shap_rfe_same_features_are_kept_after_each_run __________________________________________________

    def test_shap_rfe_same_features_are_kept_after_each_run():
        """
        Test a use case which appears to be flickering with Probatus 1.8.9 and lower.
    
        Expected result: every run the same outcome.
        Probatus <= 1.8.9: A different order every time.
        """
        SEED = 1234
    
        feature_names = [(f"f{num}") for num in range(1, 21)]
    
        # Code from tutorial on probatus documentation
        X, y = make_classification(
            n_samples=100,
            class_sep=0.05,
            n_informative=6,
            n_features=20,
            random_state=SEED,
            n_redundant=10,
            n_clusters_per_class=1,
        )
        X = pd.DataFrame(X, columns=feature_names)
    
        random_forest = RandomForestClassifier(
            random_state=SEED,
            n_estimators=70,
            max_features="log2",
            criterion="entropy",
            class_weight="balanced",
        )
    
        shap_elimination = ShapRFECV(
            clf=random_forest,
            step=0.2,
            cv=5,
            scoring="f1_macro",
            n_jobs=1,
            random_state=SEED,
        )
    
        report = shap_elimination.fit_compute(X, y, check_additivity=True, seed=SEED)
        # Return the set of features with the best validation accuracy
    
        kept_features = list(report.iloc[[report["val_metric_mean"].idxmax() - 1]]["features_set"].to_list()[0])
    
        # Results from the first run
>       assert ["f6", "f10", "f12", "f14", "f15", "f17", "f18", "f20"] == kept_features
E       AssertionError: assert ['f6', 'f10',...', 'f17', ...] == ['f2', 'f3', ...', 'f12', ...]
E         At index 0 diff: 'f6' != 'f2'
E         Right contains 5 more items, first extra item: 'f15'
E         Full diff:
E           [
E         -  'f2',
E         -  'f3',
E            'f6',...
E         
E         ...Full output truncated (11 lines hidden), use '-vv' to show

tests/feature_elimination/test_feature_elimination.py:402: AssertionError
______________________________________________________________ test_shap_resemblance_class ______________________________________________________________

X1 =    col_1  col_2  col_3
1      1      0      0
2      1      0      0
3      1      0      0
4      1      0      0
X2 =    col_1  col_2  col_3
1      0      0      0
2      0      0      0
3      0      0      0
4      0      0      0

    def test_shap_resemblance_class(X1, X2):
        """
        Test.
        """
        clf = DecisionTreeClassifier(max_depth=1, random_state=1)
        rm = SHAPImportanceResemblance(clf, test_prc=0.5, n_jobs=1, random_state=42)
    
        # Before fit it should raise an exception
        with pytest.raises(NotFittedError) as _:
            rm._check_if_fitted()
    
        actual_report, train_score, test_score = rm.fit_compute(X1, X2, return_scores=True)
    
        # After the fit this should not raise any error
        rm._check_if_fitted()
    
        assert train_score == 1
        assert test_score == 1
    
        # Check report shape
        assert actual_report.shape == (3, 2)
        # Check if it is sorted by importance
        assert actual_report.iloc[0].name == "col_1"
        # Check report values
        assert actual_report.loc["col_1"]["mean_abs_shap_value"] > 0
>       assert actual_report.loc["col_1"]["mean_shap_value"] >= 0
E       assert -0.5 >= 0

tests/sample_similarity/test_resemblance_model.py:142: AssertionError
________________________________________________________ test_shap_resemblance_class_lin_models _________________________________________________________

X1 =    col_1  col_2  col_3
1      1      0      0
2      1      0      0
3      1      0      0
4      1      0      0
X2 =    col_1  col_2  col_3
1      0      0      0
2      0      0      0
3      0      0      0
4      0      0      0

    def test_shap_resemblance_class_lin_models(X1, X2):
        """
        Test.
        """
        # Test SHAP Resemblance Model for linear models.
        clf = LogisticRegression()
        rm = SHAPImportanceResemblance(clf, test_prc=0.5, n_jobs=1, random_state=42)
    
        # Before fit it should raise an exception
        with pytest.raises(NotFittedError) as _:
            rm._check_if_fitted()
    
        actual_report, train_score, test_score = rm.fit_compute(
            X1, X2, return_scores=True, approximate=True, check_additivity=False
        )
    
        # After the fit this should not raise any error
        rm._check_if_fitted()
    
        assert train_score == 1
        assert test_score == 1
    
        # Check report shape
        assert actual_report.shape == (3, 2)
        # Check if it is sorted by importance
        assert actual_report.iloc[0].name == "col_1"
        # Check report values
        assert actual_report.loc["col_1"]["mean_abs_shap_value"] > 0
>       assert actual_report.loc["col_1"]["mean_shap_value"] > 0
E       assert -0.4010580957517599 > 0

tests/sample_similarity/test_resemblance_model.py:184: AssertionError
=================================================================== warnings summary ====================================================================
tests/binning/test_binning.py::test_quantile_with_unique_values
  Unable to calculate quantile bins for this feature, because possibly there is too many duplicate values.Approximated quantiles, as a result,the multiple boundaries have the same value. The number of bins has been lowered to [-1.         -0.98970775 -0.83365883 -0.70486693 -0.46046316 -0.25796119
   -0.03009015]. This can cause issue if you want to calculate the statistical test based on this binning. We suggest to retry with max number of bins of [-1.         -0.98970775 -0.83365883 -0.70486693 -0.46046316 -0.25796119
   -0.03009015] or apply different type of binning e.g. simple. If you run this functionality in AutoDist for multiple features, then you can decrease the bins only for that feature in a separate AutoDist run.

tests/docs/test_docstring.py::test_class_docstrings[QuantileBucketer]
  Unable to calculate quantile bins for this feature, because possibly there is too many duplicate values.Approximated quantiles, as a result,the multiple boundaries have the same value. The number of bins has been lowered to [0.         0.33333333 1.        ]. This can cause issue if you want to calculate the statistical test based on this binning. We suggest to retry with max number of bins of [0.         0.33333333 1.        ] or apply different type of binning e.g. simple. If you run this functionality in AutoDist for multiple features, then you can decrease the bins only for that feature in a separate AutoDist run.

tests/docs/test_docstring.py: 14 warnings
tests/interpret/test_shap_dependence.py: 4 warnings
tests/metric_volatility/test_metric_volatility.py: 1 warning
tests/sample_similarity/test_resemblance_model.py: 2 warnings
  Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure.

tests/docs/test_docstring.py: 5 warnings
tests/interpret/test_model_interpret.py: 2 warnings
tests/interpret/test_shap_dependence.py: 6 warnings
  Auto-removal of overlapping axes is deprecated since 3.6 and will be removed two minor releases later; explicitly call ax.remove() as needed.

tests/docs/test_docstring.py::test_class_docstrings[ShapModelInterpreter]
tests/interpret/test_model_interpret.py::test_shap_interpret
tests/interpret/test_model_interpret.py::test_shap_interpret
tests/interpret/test_model_interpret.py::test_shap_interpret_lin_models
tests/interpret/test_model_interpret.py::test_shap_interpret_lin_models
tests/interpret/test_model_interpret.py::test_shap_interpret_complex_data
tests/interpret/test_model_interpret.py::test_shap_interpret_complex_data
tests/sample_similarity/test_resemblance_model.py::test_shap_resemblance_class2
  No data for colormapping provided via 'c'. Parameters 'vmin', 'vmax' will be ignored

tests/docs/test_docstring.py::test_class_docstrings[ShapModelInterpreter]
tests/interpret/test_model_interpret.py::test_shap_interpret
tests/sample_similarity/test_resemblance_model.py::test_shap_resemblance_class2
  The figure layout has changed to tight

tests/docs/test_docstring.py::test_class_docstrings[TrainTestVolatility]
tests/docs/test_docstring.py::test_class_docstrings[SplitSeedVolatility]
tests/metric_volatility/test_metric_volatility.py::test_fit_train_test_sample_seed
tests/metric_volatility/test_metric_volatility.py::test_fit_compute_complex
  This function is deprecated. Please call randint(0, 999999 + 1) instead

tests/feature_elimination/test_feature_elimination.py::test_shap_rfe_randomized_search
tests/feature_elimination/test_feature_elimination.py::test_shap_rfe_randomized_search
tests/feature_elimination/test_feature_elimination.py::test_shap_rfe_randomized_search_cols_to_keep
tests/feature_elimination/test_feature_elimination.py::test_shap_rfe_randomized_search_cols_to_keep
  
  2 fits failed out of a total of 4.
  The score on these train-test partitions for these parameters will be set to nan.
  If these failures are not expected, you can try to debug them by setting error_score='raise'.
  
  Below are more details about the failures:
  --------------------------------------------------------------------------------
  2 fits failed with the following error:
  Traceback (most recent call last):
    File "/Users/danielherman/Documents/projects/temp/probatus/env/lib/python3.11/site-packages/sklearn/model_selection/_validation.py", line 732, in _fit_and_score
      estimator.fit(X_train, y_train, **fit_params)
    File "/Users/danielherman/Documents/projects/temp/probatus/env/lib/python3.11/site-packages/sklearn/base.py", line 1144, in wrapper
      estimator._validate_params()
    File "/Users/danielherman/Documents/projects/temp/probatus/env/lib/python3.11/site-packages/sklearn/base.py", line 637, in _validate_params
      validate_parameter_constraints(
    File "/Users/danielherman/Documents/projects/temp/probatus/env/lib/python3.11/site-packages/sklearn/utils/_param_validation.py", line 95, in validate_parameter_constraints
      raise InvalidParameterError(
  sklearn.utils._param_validation.InvalidParameterError: The 'min_samples_split' parameter of DecisionTreeClassifier must be an int in the range [2, inf) or a float in the range (0.0, 1.0]. Got 1 instead.

tests/feature_elimination/test_feature_elimination.py::test_shap_rfe_randomized_search
tests/feature_elimination/test_feature_elimination.py::test_shap_rfe_randomized_search
tests/feature_elimination/test_feature_elimination.py::test_shap_rfe_randomized_search_cols_to_keep
tests/feature_elimination/test_feature_elimination.py::test_shap_rfe_randomized_search_cols_to_keep
  One or more of the test scores are non-finite: [nan  1.]

tests/feature_elimination/test_feature_elimination.py::test_complex_dataset
tests/feature_elimination/test_feature_elimination.py::test_shap_rfe_randomized_search_early_stopping_lightGBM
tests/interpret/test_model_interpret.py::test_shap_interpret_complex_data
tests/metric_volatility/test_metric_volatility.py::test_fit_compute_complex
  The following variables in X contains missing values ['f2_missing']. Make sure to impute missing or apply a model that handles them automatically.

tests/feature_elimination/test_feature_elimination.py::test_complex_dataset
tests/feature_elimination/test_feature_elimination.py::test_shap_rfe_randomized_search_early_stopping_lightGBM
tests/interpret/test_model_interpret.py::test_shap_interpret_complex_data
tests/interpret/test_model_interpret.py::test_shap_interpret_complex_data
tests/metric_volatility/test_metric_volatility.py::test_fit_compute_complex
  The following variables in X contains categorical variables: ['f1_categorical']. Make sure to use a model that handles them automatically or encode them into numerical variables.

tests/feature_elimination/test_feature_elimination.py::test_shap_rfe_early_stopping_CatBoost
  numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 232 from PyObject

tests/feature_elimination/test_feature_elimination.py::test_shap_rfe_randomized_search_early_stopping_lightGBM
  Early stopping will be used only during Shapley value estimation step, and not for hyperparameter optimization.

tests/interpret/test_model_interpret.py::test_shap_interpret_complex_data
  The following variables in X_train contains missing values ['f2_missing']. Make sure to impute missing or apply a model that handles them automatically.

tests/interpret/test_model_interpret.py::test_shap_interpret_complex_data
  The following variables in X_train contains categorical variables: ['f1_categorical']. Make sure to use a model that handles them automatically or encode them into numerical variables.

tests/interpret/test_model_interpret.py::test_shap_interpret_complex_data
  The following variables in X_test contains categorical variables: ['f1_categorical']. Make sure to use a model that handles them automatically or encode them into numerical variables.

tests/stat_tests/test_distribution_statistics.py::test_distribution_statistics_psi
tests/stat_tests/test_distribution_statistics.py::test_distribution_statistics_autodist_default
tests/stat_tests/test_distribution_statistics.py::test_distribution_statistics_autodist_default
tests/stat_tests/test_distribution_statistics.py::test_missing_values_in_autodist
tests/stat_tests/test_distribution_statistics.py::test_missing_values_in_autodist
tests/stat_tests/test_stat_tests.py::test_psi_returns_large
  PSI: Some of the buckets have zero counts. In theory this situation would mean PSI=Inf due to division by 0. However, we artificially modified the count of samples in these bins to a small number. This may cause that the PSI value for this feature is over-estimated (larger). Decreasing the number of buckets may also help avoid buckets with zero counts.

tests/stat_tests/test_distribution_statistics.py: 10 warnings
tests/stat_tests/test_stat_tests.py: 1 warning
  p-value capped: true value larger than 0.25

tests/stat_tests/test_distribution_statistics.py::test_distribution_statistics_autodist_base
tests/stat_tests/test_distribution_statistics.py::test_distribution_statistics_autodist_base
tests/stat_tests/test_distribution_statistics.py::test_distribution_statistics_autodist_base
tests/stat_tests/test_distribution_statistics.py::test_distribution_statistics_autodist_base
  Input data for shapiro has range zero. The results may not be accurate.

tests/stat_tests/test_distribution_statistics.py::test_distribution_statistics_autodist_base
tests/stat_tests/test_distribution_statistics.py::test_distribution_statistics_autodist_base
  PSI is not well-behaved when using more than 20 bins.

tests/stat_tests/test_distribution_statistics.py::test_distribution_statistics_autodist_base
tests/stat_tests/test_distribution_statistics.py::test_distribution_statistics_autodist_base
  invalid value encountered in log

tests/stat_tests/test_distribution_statistics.py::test_missing_values_in_autodist
  Missing values in column 0 have been removed

tests/stat_tests/test_distribution_statistics.py::test_missing_values_in_autodist
  Missing values in column 1 have been removed

tests/stat_tests/test_distribution_statistics.py::test_missing_values_in_autodist
  Missing values in column 2 have been removed

tests/stat_tests/test_distribution_statistics.py::test_missing_values_in_autodist
  Missing values in column 3 have been removed

tests/stat_tests/test_distribution_statistics.py::test_missing_values_in_autodist
  Missing values in column 4 have been removed

tests/stat_tests/test_distribution_statistics.py::test_warnings_are_issued_for_missing
tests/stat_tests/test_distribution_statistics.py::test_warnings_are_issued_for_missing
tests/stat_tests/test_distribution_statistics.py::test_warnings_are_issued_for_missing
tests/stat_tests/test_distribution_statistics.py::test_warnings_are_issued_for_missing
  Passing None has been deprecated.
  See https://docs.pytest.org/en/latest/how-to/capture-warnings.html#additional-use-cases-of-warnings-in-tests for alternatives in common use cases.

tests/stat_tests/test_stat_tests.py::test_ad_returns_small
  p-value floored: true value smaller than 0.001

tests/utils/test_utils_array_funcs.py::test_check_1d_array
tests/utils/test_utils_array_funcs.py::test_check_1d_array
  Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.

tests/utils/test_utils_array_funcs.py::test_preprocess_labels
  The labels in y2 contain [False] unique values. The features in probatus support binary classification models, thus, the feature might not work correctly.

tests/utils/test_utils_array_funcs.py::test_preprocess_labels
  The labels in y3 contain [0 1 2 3 4] unique values. The features in probatus support binary classification models, thus, the feature might not work correctly.

tests/utils/test_utils_array_funcs.py::test_preprocess_data
  The following variables in X1 contains missing values ['2']. Make sure to impute missing or apply a model that handles them automatically.

tests/utils/test_utils_array_funcs.py::test_preprocess_data
  The following variables in X1 contains categorical variables: ['1']. Make sure to use a model that handles them automatically or encode them into numerical variables.

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================ short test summary info ================================================================
FAILED tests/feature_elimination/test_feature_elimination.py::test_shap_automatic_num_feature_selection - AssertionError: assert array(['col_2'], dtype=object) == ['col_3']
FAILED tests/feature_elimination/test_feature_elimination.py::test_shap_rfe_same_features_are_kept_after_each_run - AssertionError: assert ['f6', 'f10',...', 'f17', ...] == ['f2', 'f3', ...', 'f12', ...]
FAILED tests/sample_similarity/test_resemblance_model.py::test_shap_resemblance_class - assert -0.5 >= 0
FAILED tests/sample_similarity/test_resemblance_model.py::test_shap_resemblance_class_lin_models - assert -0.4010580957517599 > 0
===================================== 4 failed, 127 passed, 6 skipped, 1 xfailed, 114 warnings in 66.45s (0:01:06) ======================================

@detrin
Copy link
Contributor

detrin commented Aug 19, 2023

With versions

shap==0.42.1
numpy==1.23.3
numba==0.57.0

so the trouble will be with new release of shape. I know there was some initiative about resurrecting the project. I will check later today what is the most likely issue in ShapRFECV

@detrin
Copy link
Contributor

detrin commented Aug 19, 2023

Seems like the only import is

from shap import Explainer
from shap.explainers._tree import Tree
from shap.utils import sample
from sklearn.pipeline import Pipeline

at https://github.com/ing-bank/probatus/blob/main/probatus/utils/shap_helpers.py#L25

@detrin
Copy link
Contributor

detrin commented Aug 19, 2023

Okay so in test_shap_automatic_num_feature_selection when I am looking at the

X = pd.DataFrame(
      {
          "col_1": [1, 0, 1, 0, 1, 0, 1, 0],
          "col_2": [0, 0, 0, 0, 0, 1, 1, 1],
          "col_3": [1, 1, 1, 0, 0, 0, 0, 0],
      }
  )
  y = pd.Series([0, 0, 0, 0, 1, 1, 1, 1])

both "col_2" and "col_3" seems to be valid options, so this is more issue of designed test. It is a bit unfair that random seed in this version of shape is not preserved with results.

@detrin
Copy link
Contributor

detrin commented Aug 19, 2023

The same is for test_shap_rfe_same_features_are_kept_after_each_run as the information value in columns is symmetrical. I will check it by editing the test.

@ReinierKoops
Copy link
Author

Amazing work, looking forward to review your pr

@detrin
Copy link
Contributor

detrin commented Aug 21, 2023

Okay, so I started to dig into it right just a while ago and I found out that it is caused by packages in all extra. See with installing just the package at python3.11 I am getting right now obvious error

============================================================================================ short test summary info ============================================================================================
ERROR tests/feature_elimination/test_feature_elimination.py::test_get_feature_shap_values_per_fold_early_stopping_CatBoost - ModuleNotFoundError: No module named 'catboost'
======================================================================= 130 passed, 6 skipped, 1 xfailed, 106 warnings, 1 error in 53.97s =======================================================================
$ pip freeze
cloudpickle==2.2.1
contourpy==1.1.0
cycler==0.11.0
fonttools==4.42.1
joblib==1.3.2
kiwisolver==1.4.4
llvmlite==0.40.1
matplotlib==3.7.2
numba==0.57.1
numpy==1.24.4
packaging==23.1
pandas==2.0.3
Pillow==10.0.0
-e git+https://github.com/detrin/probatus.git@fd0cadd624f7aec32fc06a08336125a013d6e3e9#egg=probatus
pyparsing==3.0.9
python-dateutil==2.8.2
pytz==2023.3
scikit-learn==1.3.0
scipy==1.11.2
shap==0.42.1
six==1.16.0
slicer==0.0.7
threadpoolctl==3.2.0
tqdm==4.66.1
tzdata==2023.3

With installing python extra I am getting fail at

============================================================================================ short test summary info ============================================================================================
FAILED tests/feature_elimination/test_feature_elimination.py::test_shap_automatic_num_feature_selection - AssertionError: assert array(['col_2'], dtype=object) == ['col_3']
FAILED tests/feature_elimination/test_feature_elimination.py::test_shap_rfe_same_features_are_kept_after_each_run - AssertionError: assert ['f6', 'f10',...', 'f17', ...] == ['f2', 'f3', ...', 'f12', ...]
FAILED tests/sample_similarity/test_resemblance_model.py::test_shap_resemblance_class - assert -0.5 >= 0
FAILED tests/sample_similarity/test_resemblance_model.py::test_shap_resemblance_class_lin_models - assert -0.4010580957517599 > 0
FAILED tests/utils/test_utils_array_funcs.py::test_check_1d_array - ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.
================================================================= 5 failed, 126 passed, 6 skipped, 1 xfailed, 112 warnings in 87.25s (0:01:27) ==================================================================

and the packages installed are

$ pip freeze
anyio==3.7.1
appnope==0.1.3
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.2.3
asttokens==2.2.1
async-lru==2.0.4
attrs==23.1.0
Babel==2.12.1
backcall==0.2.0
beautifulsoup4==4.12.2
black==23.7.0
bleach==6.0.0
bracex==2.3.post1
catboost==1.2
certifi==2023.7.22
cffi==1.15.1
cfgv==3.4.0
charset-normalizer==3.2.0
click==8.1.7
cloudpickle==2.2.1
codespell==2.2.5
colorama==0.4.6
comm==0.1.4
contourpy==1.1.0
coverage==7.3.0
csscompressor==0.9.5
cycler==0.11.0
debugpy==1.6.7.post1
decorator==5.1.1
defusedxml==0.7.1
distlib==0.3.7
executing==1.2.0
fastjsonschema==2.18.0
filelock==3.12.2
fonttools==4.42.1
fqdn==1.5.1
ghp-import==2.1.0
gitdb==4.0.10
GitPython==3.1.32
graphviz==0.20.1
griffe==0.34.0
htmlmin2==0.1.13
identify==2.5.26
idna==3.4
iniconfig==2.0.0
ipykernel==6.25.1
ipython==8.14.0
ipython-genutils==0.2.0
ipywidgets==8.1.0
isoduration==20.11.0
isort==5.12.0
jedi==0.19.0
Jinja2==3.1.2
joblib==1.3.2
jsmin==3.0.1
json5==0.9.14
jsonpointer==2.4
jsonschema==4.19.0
jsonschema-specifications==2023.7.1
jupyter==1.0.0
jupyter-console==6.6.3
jupyter-events==0.7.0
jupyter-lsp==2.2.0
jupyter_client==8.3.0
jupyter_core==5.3.1
jupyter_server==2.7.2
jupyter_server_terminals==0.4.4
jupyterlab==4.0.5
jupyterlab-pygments==0.2.2
jupyterlab-widgets==3.0.8
jupyterlab_server==2.24.0
kiwisolver==1.4.4
lightgbm==4.0.0
llvmlite==0.40.1
Markdown==3.4.4
MarkupSafe==2.1.3
matplotlib==3.7.2
matplotlib-inline==0.1.6
mergedeep==1.3.4
mistune==3.0.1
mkdocs==1.5.2
mkdocs-autorefs==0.5.0
mkdocs-awesome-pages-plugin==2.9.2
mkdocs-enumerate-headings-plugin==0.6.1
mkdocs-git-authors-plugin==0.7.2
mkdocs-git-revision-date-localized-plugin==1.2.0
mkdocs-markdownextradata-plugin==0.2.5
mkdocs-material==9.1.21
mkdocs-material-extensions==1.1.1
mkdocs-minify-plugin==0.7.1
mkdocs-print-site-plugin==2.3.5
mkdocs-table-reader-plugin==2.0.1
mkdocstrings==0.22.0
mkdocstrings-python==1.5.0
mknotebooks==0.8.0
mypy==1.5.1
mypy-extensions==1.0.0
natsort==8.4.0
nbclient==0.8.0
nbconvert==7.7.4
nbformat==5.9.2
nest-asyncio==1.5.7
nodeenv==1.8.0
notebook==7.0.2
notebook_shim==0.2.3
numba==0.57.1
numpy==1.24.4
overrides==7.4.0
packaging==23.1
pandas==2.0.3
pandocfilters==1.5.0
parso==0.8.3
pathspec==0.11.2
pexpect==4.8.0
pickleshare==0.7.5
Pillow==10.0.0
platformdirs==3.10.0
plotly==5.16.1
pluggy==1.2.0
pre-commit==3.3.3
-e git+https://github.com/detrin/probatus.git@fd0cadd624f7aec32fc06a08336125a013d6e3e9#egg=probatus
prometheus-client==0.17.1
prompt-toolkit==3.0.39
psutil==5.9.5
ptyprocess==0.7.0
pure-eval==0.2.2
pycparser==2.21
pyflakes==3.1.0
Pygments==2.16.1
pymdown-extensions==10.1
pyparsing==3.0.9
pytest==7.4.0
pytest-cov==4.1.0
python-dateutil==2.8.2
python-json-logger==2.0.7
pytz==2023.3
PyYAML==6.0.1
pyyaml_env_tag==0.1
pyzmq==25.1.1
qtconsole==5.4.3
QtPy==2.3.1
referencing==0.30.2
regex==2023.8.8
requests==2.31.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rpds-py==0.9.2
ruff==0.0.285
scikit-learn==1.3.0
scipy==1.11.2
seaborn==0.12.2
Send2Trash==1.8.2
shap==0.42.1
six==1.16.0
slicer==0.0.7
smmap==5.0.0
sniffio==1.3.0
soupsieve==2.4.1
stack-data==0.6.2
tabulate==0.9.0
tenacity==8.2.3
terminado==0.17.1
threadpoolctl==3.2.0
tinycss2==1.2.1
tornado==6.3.3
tqdm==4.66.1
traitlets==5.9.0
typing_extensions==4.7.1
tzdata==2023.3
uri-template==1.3.0
urllib3==2.0.4
virtualenv==20.24.3
watchdog==3.0.0
wcmatch==8.4.1
wcwidth==0.2.6
webcolors==1.13
webencodings==0.5.1
websocket-client==1.6.1
widgetsnbextension==4.0.8
xgboost==1.7.6

Seems that none of the extra packages are pulling back any normal dependency.

I had an idea that it might be caused by pytest itself in some way. Comparing pytest with installation without extra

~/Documents/projects/temp/probatus_fork main !1 ?2 ❯ pytest -sv                                                                                                                                 Py probatus_fork
============================================================================================== test session starts ==============================================================================================
platform darwin -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0 -- /Users/danielherman/.pyenv/versions/3.10.6/bin/python3.10
cachedir: .pytest_cache
rootdir: /Users/danielherman/Documents/projects/temp/probatus_fork
plugins: anyio-3.6.2

and of course it just spits out the error with catboots. Perhaps the pytest is accessing system installed packages. So, I have created following test

import subprocess

def test_pip_feeeze():
    try:
        result = subprocess.run(['pip', 'freeze'], capture_output=True, text=True, check=True)
        
        print("Pip freeze output:")
        print(result.stdout)
        
    except subprocess.CalledProcessError as e:
        print("An error occurred:", e)

and of course it prints out my system packages, so this error might lead to finding what packages caused fail errors.
Here are the differences

cloudpickle==2.2.0
contourpy==1.0.5
cycler==0.11.0
fonttools==4.37.4
joblib==1.2.0
kiwisolver==1.4.4
llvmlite==0.39.1
matplotlib==3.5.3
numba==0.56.4
numpy==1.21.6
packaging==21.3
pandas==1.3.5
Pillow==9.3.0
grep: git+https://github.com/detrin/probatus.git@fd0cadd624f7aec32fc06a08336125a013d6e3e9#egg=: No such file or directory
pyparsing==3.0.9
python-dateutil==2.8.1
pytz==2019.3
scikit-learn==1.1.2
scipy==1.8.1
shap==0.41.0
six==1.16.0
slicer==0.0.7
threadpoolctl==3.1.0
tqdm==4.64.1
image

@detrin
Copy link
Contributor

detrin commented Aug 21, 2023

I am not sure how much time I will have today for this issue. Apparently, it is not caused by scikit-lean. I have tried 1.2.0 and I am still getting fail on those errors.

@detrin
Copy link
Contributor

detrin commented Aug 21, 2023

It won't be numba, version 0.57.0 gives still error in pytest. When switching back to shap==0.41.0 it gives me a lot of jit errors even tho I have numba==0.57.1

@detrin
Copy link
Contributor

detrin commented Aug 21, 2023

Okay, so all tests are passing with

numpy==1.23.5
shap==0.41.0
numba==0.57.1

the rest can be updated. You know that shap had fixed dependency of numpy<=1.23.5, so I am pretty confident it was caused solely by shap release.

@detrin
Copy link
Contributor

detrin commented Aug 21, 2023

Looking at changes in Explainer class that is used in this package https://github.com/shap/shap/blame/bbee3787139954ea355854001a977132ec8123b0/shap/explainers/_explainer.py#L17 I don't see anything relevant.

@detrin
Copy link
Contributor

detrin commented Aug 21, 2023

Inspecting shap codebase I don't see any relevant change in logic of _kernel.py, _linear.py or _tree.py between tag v0.42.1 and v0.41.0. I will look whether there are any changes in output of shap_calc()

@detrin
Copy link
Contributor

detrin commented Aug 21, 2023

I can confirm now that the logic on the side of shap package changes and that is the reason why the tests are failing!
By adding following after line https://github.com/ing-bank/probatus/blob/main/probatus/feature_elimination/feature_elimination.py#L398

shap_values = shap_calc(clf, X_val, verbose=self.verbose, **shap_kwargs)
print(shap_values.shape)
print(shap_values)

With shap==0.41.0

tests/feature_elimination/test_feature_elimination.py::test_shap_automatic_num_feature_selection (4, 3)
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 1. 0.]]
(4, 3)
[[0. 0. 0.]
 [0. 0. 1.]
 [0. 0. 1.]
 [0. 0. 1.]]
(4, 2)
[[0. 0.]
 [0. 0.]
 [0. 0.]
 [1. 0.]]
(4, 2)
[[0. 0.]
 [0. 1.]
 [0. 1.]
 [0. 1.]]
(4, 1)
[[0.        ]
 [0.        ]
 [0.66666667]
 [0.66666667]]
(4, 1)
[[0.]
 [1.]
 [1.]
 [1.]]

With shap==0.42.1

tests/feature_elimination/test_feature_elimination.py::test_shap_automatic_num_feature_selection (4, 3)
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 1. 0.]]
(4, 3)
[[ 0.  0. -1.]
 [ 0.  0.  0.]
 [ 0.  0.  0.]
 [ 0.  0.  0.]]
(4, 2)
[[0. 0.]
 [0. 0.]
 [0. 0.]
 [1. 0.]]
(4, 2)
[[ 0. -1.]
 [ 0.  0.]
 [ 0.  0.]
 [ 0.  0.]]
(4, 1)
[[0.]
 [0.]
 [0.]
 [1.]]
(4, 1)
[[-0.66666667]
 [-0.66666667]
 [ 0.        ]
 [ 0.        ]]
image

@detrin
Copy link
Contributor

detrin commented Aug 21, 2023

To further prove the point I used

print("================")
print("shap_values")
print(shap_values.shape)
print(shap_values)
print("X_val")
print(X_val.shape)
print(X_val)
print("clf LogReg weights")
print(clf.coef_, clf.intercept_)

and changed in test_shap_automatic_num_feature_selection()

clf = DecisionTreeClassifier(max_depth=1, random_state=1)

into

clf = LogisticRegression(C=1, solver="liblinear")

We have with shap==0.41.0

tests/feature_elimination/test_feature_elimination.py::test_shap_automatic_num_feature_selection ================
shap_values
(4, 3)
[[-0.          0.         -0.        ]
 [ 0.01426327  0.         -0.        ]
 [-0.          0.          0.37412245]
 [ 0.01426327  0.71643134  0.37412245]]
X_val
(4, 3)
   col_1  col_2  col_3
0      1      0      1
1      0      0      1
4      1      0      0
5      0      1      0
clf LogReg weights
[[-0.01426327  0.71643134 -0.37412245]] [-0.12618387]
================
shap_values
(4, 3)
[[ 0.          0.         -0.        ]
 [-0.08174514  0.          0.72205473]
 [ 0.          0.37990537  0.72205473]
 [-0.08174514  0.37990537  0.72205473]]
X_val
(4, 3)
   col_1  col_2  col_3
2      1      0      1
3      0      0      0
6      1      1      0
7      0      1      0
clf LogReg weights
[[ 0.08174514  0.37990537 -0.72205473]] [0.11004584]
================
shap_values
(4, 2)
[[ 0.         -0.        ]
 [ 0.         -0.        ]
 [ 0.          0.37629223]
 [ 0.71507298  0.37629223]]
X_val
(4, 2)
   col_2  col_3
0      0      1
1      0      1
4      0      0
5      1      0
clf LogReg weights
[[ 0.71507298 -0.37629223]] [-0.12901078]
================
shap_values
(4, 2)
[[ 0.         -0.        ]
 [ 0.          0.71507298]
 [ 0.37629223  0.71507298]
 [ 0.37629223  0.71507298]]
X_val
(4, 2)
   col_2  col_3
2      0      1
3      0      0
6      1      0
7      1      0
clf LogReg weights
[[ 0.37629223 -0.71507298]] [0.12901078]
================
shap_values
(4, 1)
[[-0.        ]
 [-0.        ]
 [ 0.41094646]
 [ 0.41094646]]
X_val
(4, 1)
   col_3
0      1
1      1
4      0
5      0
clf LogReg weights
[[-0.41094646]] [0.05089244]
================
shap_values
(4, 1)
[[-0.        ]
 [ 0.73101554]
 [ 0.73101554]
 [ 0.73101554]]
X_val
(4, 1)
   col_3
2      1
3      0
6      0
7      0
clf LogReg weights
[[-0.73101554]] [0.17948298]

With shap==0.42.1

tests/feature_elimination/test_feature_elimination.py::test_shap_automatic_num_feature_selection ================
shap_values
(4, 3)
[[-0.          0.         -0.37412245]
 [ 0.01426327  0.         -0.37412245]
 [-0.          0.         -0.        ]
 [ 0.01426327  0.71643134 -0.        ]]
X_val
(4, 3)
   col_1  col_2  col_3
0      1      0      1
1      0      0      1
4      1      0      0
5      0      1      0
clf LogReg weights
[[-0.01426327  0.71643134 -0.37412245]] [-0.12618387]
================
shap_values
(4, 3)
[[ 0.         -0.37990537 -0.72205473]
 [-0.08174514 -0.37990537 -0.        ]
 [ 0.          0.         -0.        ]
 [-0.08174514  0.         -0.        ]]
X_val
(4, 3)
   col_1  col_2  col_3
2      1      0      1
3      0      0      0
6      1      1      0
7      0      1      0
clf LogReg weights
[[ 0.08174514  0.37990537 -0.72205473]] [0.11004584]
================
shap_values
(4, 2)
[[ 0.         -0.37629223]
 [ 0.         -0.37629223]
 [ 0.         -0.        ]
 [ 0.71507298 -0.        ]]
X_val
(4, 2)
   col_2  col_3
0      0      1
1      0      1
4      0      0
5      1      0
clf LogReg weights
[[ 0.71507298 -0.37629223]] [-0.12901078]
================
shap_values
(4, 2)
[[-0.37629223 -0.71507298]
 [-0.37629223 -0.        ]
 [ 0.         -0.        ]
 [ 0.         -0.        ]]
X_val
(4, 2)
   col_2  col_3
2      0      1
3      0      0
6      1      0
7      1      0
clf LogReg weights
[[ 0.37629223 -0.71507298]] [0.12901078]
================
shap_values
(4, 1)
[[0.        ]
 [0.        ]
 [0.        ]
 [0.73101554]]
X_val
(4, 1)
   col_2
0      0
1      0
4      0
5      1
clf LogReg weights
[[0.73101554]] [-0.17948298]
================
shap_values
(4, 1)
[[-0.41094646]
 [-0.41094646]
 [ 0.        ]
 [ 0.        ]]
X_val
(4, 1)
   col_2
2      0
3      0
6      1
7      1
clf LogReg weights
[[0.41094646]] [-0.05089244]

I am now confident that the change is in tree kernel in shap pkg
image

@detrin
Copy link
Contributor

detrin commented Aug 21, 2023

Looking at https://github.com/ing-bank/probatus/blob/main/probatus/utils/shap_helpers.py#L200 I now understand what is the source of error on the side of probatus
Old version will return shap_values

[[ 0.          0.         -0.        ]	
 [-0.08174514  0.          0.72205473]	
 [ 0.          0.37990537  0.72205473]	
 [-0.08174514  0.37990537  0.72205473]]

The new version of shap will return

[[ 0.         -0.37990537 -0.72205473]
[-0.08174514 -0.37990537 -0.        ]
[ 0.          0.         -0.        ]
[-0.08174514  0.         -0.        ]]

Looking at those shap values, I don't see anything wrong with it. I will need to dig deeper on the side of shap.

@ReinierKoops If you want to have a look at my so far findings, or maybe you have similar experience with shap in past.
I would like to note that returning pd.DataFrame in calculate_shap_importance() is not a good pattern.

@ReinierKoops
Copy link
Author

@detrin Ive been reading along with your findings and confirm your thorough approach. Thank you. I’m a bit puzzled now by the versioning of shap making me question it’s backward compatibility… other than that now I’m also thinking along what the next step would be in this case.

@detrin
Copy link
Contributor

detrin commented Aug 22, 2023

I think the best for the sale of this package would be find the cause on the side of shap package.

  • If we find that it is just some trivial change and the logic behind shap values didn't changed. We can just adjust the shap_calc() so that it spits out the same values as before.
  • In the case that the change is bigger in the sense that the logic behind shap values changes. The logic behind shap helper function will need to change accordingly so that the selection gives roughly same features back.

@detrin
Copy link
Contributor

detrin commented Aug 23, 2023

After additional 40 min of debugging and digging in shap and probatus I found the error. The issue is not preserved random_state in shap.utils.sample. I will submit issue to shap repository.

My suggestion for fix would be to correct failing tests as now we can be sure that the logic of shap didn't change and this change in calculated masker should not affect the quality of a result on probatus side. @ReinierKoops

@ReinierKoops
Copy link
Author

Amazing find, thank you @detrin

@detrin
Copy link
Contributor

detrin commented Aug 24, 2023

@ReinierKoops No problem, I enjoyed this issue. The question now is whether you would like me to temp fix the tests or wait for the fix in shap or something else.

@ReinierKoops
Copy link
Author

I think best way is to introduce a temp fix for now. We don’t know when shap will be fixed.

@detrin
Copy link
Contributor

detrin commented Aug 24, 2023

Awesome, you can expect from me PR in following days, probably today.

ReinierKoops pushed a commit that referenced this issue Aug 26, 2023
#225

Also fixed one test for numpy>=1.24.0.
@ReinierKoops
Copy link
Author

Perfect, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants