-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid generating NaNs in UnwoundPathSum #3943
Conversation
Thanks @Kodiologist! I think your change should retain the correctness of the method. If this resolves the NaNs then the issue must have been when the proportion of training samples and/or depth weighting caused an underflow. In that case they would fail to cancel out. Unless anyone else has comments I think this should be merged. |
8f0ef79
to
945f6e7
Compare
I remembered that the SHAPs for an observation always sum to equal the prediction for that observation, so we can use that as a check. Turns out, it works! Hooray! I've incorporated it into the test script above, which should probably be put in XGBoost's test suite in some form. By the way, I think the Makefile for this project may not be tracking all dependencies correctly, because I found that I had to do |
@Kodiologist Thanks a lot for your contribution. Before merging this in, let me add your sample snippet as a unit test so that the bug doesn't creep in in the future. As for dependency management, yes, I think there is some issue with Makefile. I usually use CMake instead:
Then every change in C++ code will cause |
Oh, okay. You might want to add the CMake call to https://xgboost.readthedocs.io/en/latest/build.html#building-on-ubuntu-debian . |
@hcho3 did you ever end up adding that sample snippet? |
Also looks like the relevant code got moved to tree_model.cc since the initial PR |
@slundberg Sorry, I've been away for vacation. I'll rebase this PR against the latest master now. |
Kudos to Jakub Zakrzewski for tracking down the bug.
@Kodiologist @slundberg Okay, looks like the re-basing was successful. I'll add the test snippet soon. |
Codecov Report
@@ Coverage Diff @@
## master #3943 +/- ##
==========================================
- Coverage 60.79% 60.77% -0.02%
==========================================
Files 130 130
Lines 11699 11702 +3
==========================================
Hits 7112 7112
- Misses 4587 4590 +3
Continue to review full report at Codecov.
|
@Kodiologist @slundberg I added the test script. It should be good to merge once all checks pass. |
@Kodiologist Merged. Thanks for your contribution. |
* Avoid generating NaNs in UnwoundPathSum. Kudos to Jakub Zakrzewski for tracking down the bug. * Add a test
* Avoid generating NaNs in UnwoundPathSum. Kudos to Jakub Zakrzewski for tracking down the bug. * Add a test
I think this issue still persists. I upgraded to the latest pip library v0.27.0, but still the |
@astrixg The latest version is 0.81. Can you upgrade? If the issue persists, you should create a new issue. |
https://pypi.org/project/shap/ @hcho3 Its 0.28 on pip and 0.27 on conda. Am I installing from the wrong source? |
@astrixg hcho3 is talking about the |
This is my attempt to fix the bug creating NaN SHAP values (#3333). I have an example R script (below) which produces NaN SHAPs without this patch and doesn't produce any with it, but I don't know whether my change is statistically correct.
The script is:
Its output without the change is:
Its output with the change is:
CC @slundberg, @allanjust, @liuyanguu
Fixes #3333.