-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metric for robustness in real-world scenarios #199
Comments
Thats actually the mean (squared) error, also referred to as brier score In this scenario 🙄 |
@alexriedel1, do you think this is something we should be adding to the repo? If yes, would you be able to provide more detail regarding the use-case? |
Let me elaborate a bit more. Neither the AUROC nor the F1 capture the prediction confidence. While we reached a state of total Recall of datasets like MVTec, it's rather important to be confident about predictions than just finding the right category. Especially in real world scenarios where a pipeline for anomaly detection needs to be set up that people use on a daily basis, it's important that minor variances don't lead to wrong predictions (which might be the case if a predictor is not very confident about its prediction). The Brier score is basically the MSE (loss) of the predictions. The following code illustrates the issue.
|
@alexriedel1, thanks for clarifying this. This would indeed be useful in certain usecases. We could definitely add this metric to the repo. Our priority at the moment, however, is to make the metrics optional to the user. Metrics are currently hardcoded and computed for each run. Idealy, the user should be able to choose which metric they want to see. They could of course choose to see all of the metrics like what's happening now, but this is not so efficient in terms of memory use. |
Should I write a PR addressing this? |
Thanks @alexriedel1. @djdameln is already working on a design for this. Meanwhile you could maybe add the Brier score implementation if that's alright with you? |
@samet-akcay @djdameln In kind of pseudocode it would look like:
I hope this makes my intention a bit clear. |
@alexriedel1, thanks, it is clear. You could perhaps have a look at WiP PR #230. |
Solved with #230. The desired metric can be included by adding |
Hey all,
in real-world scenarios a useful metric for a classifier would be the anomaly-score-distance between all normal and anormal predictions in a test set. This would help to choose classifiers that are more confident in their predictions.
Do you know if a metric exists that implements this in 2-class classification problems?
The text was updated successfully, but these errors were encountered: