Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: A couple of pytest tests fail when starting of the box according to CONTRIBUTING.md #3260

Closed
3 of 4 tasks
noxthot opened this issue Sep 14, 2023 · 4 comments
Closed
3 of 4 tasks
Labels
bug Indicates an unexpected problem or unintended behaviour

Comments

@noxthot
Copy link
Contributor

noxthot commented Sep 14, 2023

Issue Description

After preparing the environment as described in CONTRIBUTING.md, some tests fail:

tests/explainers/test_deep.py sF.s.....   
tests/explainers/test_gradient.py F....

Reason:

E                                              tensorflow.python.framework.errors_impl.UnimplementedError: 2 root error(s) found.
E                                                (0) UNIMPLEMENTED: DNN library is not found.
E                                                [[{{node conv2d/Conv2D}}]]
E                                                [[dropout_1/cond/then/_15/dropout/GreaterEqual/_201]]
E                                                (1) UNIMPLEMENTED: DNN library is not found.
E                                                [[{{node conv2d/Conv2D}}]]
E                                              0 successful operations.
E                                              0 derived errors ignored.

tests/explainers/test_tree.py ...s.......F.....s......................F.........

Reason 1:

tests/explainers/test_tree.py:446: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../mambaforge/envs/shap/lib/python3.11/site-packages/xgboost/core.py:729: in inner_f
    return func(**kwargs)
../../../mambaforge/envs/shap/lib/python3.11/site-packages/xgboost/core.py:880: in __init__
    self.feature_names = feature_names
../../../mambaforge/envs/shap/lib/python3.11/site-packages/xgboost/core.py:1301: in feature_names
    feature_names = _validate_feature_info(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

feature_info = Index(['Age', 'Workclass', 'Education-Num', 'Marital Status', 'Occupation',
       'Relationship', 'Race', 'Sex', 'Capital Gain', 'Capital Loss',
       'Hours per week', 'Country'],
      dtype='object')
n_features = 12, name = 'feature names'

    def _validate_feature_info(
        feature_info: Sequence[str], n_features: int, name: str
    ) -> List[str]:
        if isinstance(feature_info, str) or not isinstance(feature_info, Sequence):
>           raise TypeError(
                f"Expecting a sequence of strings for {name}, got: {type(feature_info)}"
            )
E           TypeError: Expecting a sequence of strings for feature names, got: <class 'pandas.core.indexes.base.Index'>

../../../mambaforge/envs/shap/lib/python3.11/site-packages/xgboost/core.py:309: TypeError

Reason 2:

E       assert False
E        +  where False = <function allclose at 0x7f3fb8ad7d80>((array([ 9.92112557e-01,  1.29021940e+00, -1.04455401e-01,  4.49388057e-01,\n       -3.24168125e+00, -1.29945894e+00,  4...4415e-01, -8.65381695e-02,  5.50770050e-01,\n        7.92766425e-01,  5.15228463e-01,  9.69419559e-02, -2.50671993e+00]) + array([0.0392507, 0.0392507, 0.0392507, 0.0392507, 0.0392507, 0.0392507,\n       0.0392507, 0.0392507, 0.0392507, 0.039...  0.0392507, 0.0392507, 0.0392507, 0.0392507, 0.0392507, 0.0392507,\n       0.0392507, 0.0392507, 0.0392507, 0.0392507])), array([ 1.03136337e+00,  1.32947004e+00, -6.52047396e-02,  4.88638729e-01,\n       -3.20243073e+00, -1.26020837e+00,  5...e-02,  5.90020776e-01,\n        8.32017064e-01,  5.54479122e-01,  1.36192635e-01, -2.46746898e+00],\n      dtype=float32))
E        +    where <function allclose at 0x7f3fb8ad7d80> = np.allclose
E        +    and   array([ 9.92112557e-01,  1.29021940e+00, -1.04455401e-01,  4.49388057e-01,\n       -3.24168125e+00, -1.29945894e+00,  4...4415e-01, -8.65381695e-02,  5.50770050e-01,\n        7.92766425e-01,  5.15228463e-01,  9.69419559e-02, -2.50671993e+00]) = <built-in method sum of numpy.ndarray object at 0x7f3d04effb10>(1)
E        +      where <built-in method sum of numpy.ndarray object at 0x7f3d04effb10> = array([[-4.17558436e-01,  1.70742310e-01,  3.98745078e-01,\n         1.04974439e+00,  3.69573566e-01, -3.22893099e-01,\n...-1.38702174e-01,  7.28031911e-02,\n        -4.39721304e-01, -3.74298138e-01, -1.44950810e+00,\n         1.06481231e-01]]).sum
E        +        where array([[-4.17558436e-01,  1.70742310e-01,  3.98745078e-01,\n         1.04974439e+00,  3.69573566e-01, -3.22893099e-01,\n...-1.38702174e-01,  7.28031911e-02,\n        -4.39721304e-01, -3.74298138e-01, -1.44950810e+00,\n         1.06481231e-01]]) = .values =\narray([[-4.17558436e-01,  1.70742310e-01,  3.98745078e-01,\n         1.04974439e+00,  3.69573566e-01, -3.2289...-5.82288434e-01,  1.23781790e+00,\n         2.89367769e-01, -1.00458367e+00, -1.03253236e+00,\n        -9.01071793e-01]]).values
E        +    and   array([0.0392507, 0.0392507, 0.0392507, 0.0392507, 0.0392507, 0.0392507,\n       0.0392507, 0.0392507, 0.0392507, 0.039...  0.0392507, 0.0392507, 0.0392507, 0.0392507, 0.0392507, 0.0392507,\n       0.0392507, 0.0392507, 0.0392507, 0.0392507]) = .values =\narray([[-4.17558436e-01,  1.70742310e-01,  3.98745078e-01,\n         1.04974439e+00,  3.69573566e-01, -3.2289...-5.82288434e-01,  1.23781790e+00,\n         2.89367769e-01, -1.00458367e+00, -1.03253236e+00,\n        -9.01071793e-01]]).base_values

tests/explainers/test_tree.py:1285: AssertionError

tests/plots/test_bar.py .....F. 

tests/plots/test_beeswarm.py ..F  

tests/plots/test_heatmap.py FF

tests/plots/test_violin.py ..F.. 

tests/plots/test_waterfall.py ..FF. 

Reason:

Error: Image files did not match.

OS: Ubuntu 22.04


Most also fail in my pull request. I only touched torch deep explainer.

And here is the result when running with 0.41.1:

FAILED tests/explainers/test_deep.py::test_tf_keras_mnist_cnn - tensorflow.python.framework.errors_impl.UnimplementedError: 2 root error(s) found.
FAILED tests/explainers/test_gradient.py::test_tf_keras_mnist_cnn - tensorflow.python.framework.errors_impl.UnimplementedError: 2 root error(s) found.
FAILED tests/explainers/test_tree.py::test_gpboost - gpboost.basic.GPBoostError: The argument 'raw_score' is discontinued. Use the renamed equivalent argument 'pred_latent' instead
FAILED tests/explainers/test_tree.py::test_provided_background_tree_path_dependent - TypeError: Expecting a sequence of strings for feature names, got: <class 'pandas.core.indexes.base.Index'>
FAILED tests/explainers/test_tree.py::test_xgboost_classifier_independent_margin - assert False
FAILED tests/plots/test_bar.py::test_simple_bar - Failed: Error: Image files did not match.
FAILED tests/plots/test_beeswarm.py::test_simple_beeswarm - Failed: Error: Image files did not match.
FAILED tests/plots/test_heatmap.py::test_heatmap - Failed: Error: Image files did not match.
FAILED tests/plots/test_heatmap.py::test_heatmap_feature_order - Failed: Error: Image files did not match.
FAILED tests/plots/test_violin.py::test_violin - Failed: Error: Image files did not match.
FAILED tests/plots/test_waterfall.py::test_waterfall - Failed: Error: Image files did not match.
FAILED tests/plots/test_waterfall.py::test_waterfall_legacy - Failed: Error: Image files did not match.

Minimal Reproducible Example

pytest

Traceback

No response

Expected Behavior

Unit tests pass after following CONTRIBUTING.md.

Bug report checklist

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest release of shap.
  • I have confirmed this bug exists on the master branch of shap.
  • I'd be interested in making a PR to fix this bug

Installed Versions

Commit hash: e8a176e

@noxthot noxthot added the bug Indicates an unexpected problem or unintended behaviour label Sep 14, 2023
@connortann connortann added the duplicate Indicates similar issues, pull requests, or discussions label Sep 14, 2023
@connortann
Copy link
Collaborator

connortann commented Sep 14, 2023

I think this is a duplicate of #3254, and should hopefully be fixed by #3255 , i.e. by downgrading XGBoost to 1.7.6.

In any case, thank you for opening a bug report. I'll mark it as a duplicate for now, but please let us know if the fix above does not sort your issues out locally and we can investigate further.

@connortann connortann closed this as not planned Won't fix, can't repro, duplicate, stale Sep 14, 2023
@noxthot
Copy link
Contributor Author

noxthot commented Sep 14, 2023

Thx for pointing me at the xgboost-related issues, and I agree that most of the tests should be fixed with #3255.

But for the two failing tests in

tests/explainers/test_deep.py sF.s.....   
tests/explainers/test_gradient.py F....

I suspect that there is some tensorflow/cuda compatibility issue since it says UNIMPLEMENTED: DNN library is not found.

(running locally by following CONTRIBUTING.md)

@connortann
Copy link
Collaborator

connortann commented Sep 14, 2023

Yes, it looks like those two failures are something separate. Would you kindly provide the full stack trace for the DNN library is not found failure - is there anything else besides the snippet above?

As you say, it looks like a tensorflow/cuda compatibility error so it might not be specific to SHAP.

@connortann connortann reopened this Sep 14, 2023
@connortann connortann removed the duplicate Indicates similar issues, pull requests, or discussions label Sep 14, 2023
@noxthot
Copy link
Contributor Author

noxthot commented Sep 15, 2023

okay - I tested it on another PC with the same OS. There it worked without problems (just speaking about the tensorflow-related tests). I then compared mamba list within the shap environment on both machines and they matched nicely.

nvidia-smi on the machine where it is not working says CUDA Version: 12.0. This could be the issue I guess (in mamba list it is installing cuda 11), or probably just another problem with my local cuda installation. Anyhow it seems to not be related to shap itself.

Sorry for the hustle -> Closed.

Just for sake of clarity, you can find the stacktrace here:

=============================================================== test session starts ===============================================================
platform linux -- Python 3.11.5, pytest-7.4.2, pluggy-1.3.0
Matplotlib: 3.7.3
Freetype: 2.6.1
rootdir: /home/noxthot/dlh/external/shap
configfile: pyproject.toml
plugins: cov-4.1.0, mpl-0.16.1
collected 9 items                                                                                                                                 

tests/explainers/test_deep.py sF.s.....                                                                                                     [100%]

==================================================================== FAILURES =====================================================================
_____________________________________________________________ test_tf_keras_mnist_cnn _____________________________________________________________

random_seed = 921

    def test_tf_keras_mnist_cnn(random_seed):
        """ This is the basic mnist cnn example from keras.
        """
        tf = pytest.importorskip('tensorflow')
        rs = np.random.RandomState(random_seed)
        tf.compat.v1.random.set_random_seed(random_seed)
    
        from tensorflow import keras
        from tensorflow.compat.v1 import ConfigProto, InteractiveSession
        from tensorflow.keras import backend as K
        from tensorflow.keras.layers import (
            Activation,
            Conv2D,
            Dense,
            Dropout,
            Flatten,
            MaxPooling2D,
        )
        from tensorflow.keras.models import Sequential
    
        config = ConfigProto()
        config.gpu_options.allow_growth = True
        sess = InteractiveSession(config=config)
    
        tf.compat.v1.disable_eager_execution()
    
        batch_size = 64
        num_classes = 10
        epochs = 1
    
        # input image dimensions
        img_rows, img_cols = 28, 28
    
        # the data, split between train and test sets
        # (x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
        x_train = rs.randn(200, 28, 28)
        y_train = rs.randint(0, 9, 200)
        x_test = rs.randn(200, 28, 28)
        y_test = rs.randint(0, 9, 200)
    
        if K.image_data_format() == 'channels_first':
            x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
            x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
            input_shape = (1, img_rows, img_cols)
        else:
            x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
            x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
            input_shape = (img_rows, img_cols, 1)
    
        x_train = x_train.astype('float32')
        x_test = x_test.astype('float32')
        x_train /= 255
        x_test /= 255
    
        # convert class vectors to binary class matrices
        y_train = keras.utils.to_categorical(y_train, num_classes)
        y_test = keras.utils.to_categorical(y_test, num_classes)
    
        model = Sequential()
        model.add(Conv2D(2, kernel_size=(3, 3),
                         activation='relu',
                         input_shape=input_shape))
        model.add(Conv2D(4, (3, 3), activation='relu'))
        model.add(MaxPooling2D(pool_size=(2, 2)))
        model.add(Dropout(0.25))
        model.add(Flatten())
        model.add(Dense(16, activation='relu')) # 128
        model.add(Dropout(0.5))
        model.add(Dense(num_classes))
        model.add(Activation('softmax'))
    
        model.compile(loss=keras.losses.categorical_crossentropy,
                      optimizer=keras.optimizers.legacy.Adadelta(),
                      metrics=['accuracy'])
    
>       model.fit(x_train[:10, :], y_train[:10, :],
                  batch_size=batch_size,
                  epochs=epochs,
                  verbose=1,
                  validation_data=(x_test[:10, :], y_test[:10, :]))

tests/explainers/test_deep.py:120: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../mambaforge/envs/shap/lib/python3.11/site-packages/keras/src/engine/training_v1.py:856: in fit
    return func.fit(
../../../mambaforge/envs/shap/lib/python3.11/site-packages/keras/src/engine/training_arrays_v1.py:734: in fit
    return fit_loop(
../../../mambaforge/envs/shap/lib/python3.11/site-packages/keras/src/engine/training_arrays_v1.py:421: in model_iteration
    batch_outs = f(ins_batch)
../../../mambaforge/envs/shap/lib/python3.11/site-packages/keras/src/backend.py:4609: in __call__
    fetched = self._callable_fn(*array_vals, run_metadata=self.run_metadata)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <tensorflow.python.client.session.BaseSession._Callable object at 0x7f3c8c63f910>
args = (array([[[[ 3.7761175e-04],
         [-1.7594481e-03],
         [-3.6678240e-03],
         ...,
         [ 6.8085240e-...0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.]], dtype=float32), array(True))
kwargs = {'run_metadata': None}, run_metadata = None, run_metadata_ptr = None

    def __call__(self, *args, **kwargs):
      run_metadata = kwargs.get('run_metadata', None)
      try:
        run_metadata_ptr = tf_session.TF_NewBuffer() if run_metadata else None
>       ret = tf_session.TF_SessionRunCallable(self._session._session,
                                               self._handle, args,
                                               run_metadata_ptr)
E                                              tensorflow.python.framework.errors_impl.UnimplementedError: 2 root error(s) found.
E                                                (0) UNIMPLEMENTED: DNN library is not found.
E                                                [[{{node conv2d/Conv2D}}]]
E                                                [[dropout_1/cond/then/_15/dropout/GreaterEqual/_201]]
E                                                (1) UNIMPLEMENTED: DNN library is not found.
E                                                [[{{node conv2d/Conv2D}}]]
E                                              0 successful operations.
E                                              0 derived errors ignored.

../../../mambaforge/envs/shap/lib/python3.11/site-packages/tensorflow/python/client/session.py:1482: UnimplementedError

@noxthot noxthot closed this as completed Sep 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behaviour
Projects
None yet
Development

No branches or pull requests

2 participants