-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add binning capability to AUPRO #1145
Add binning capability to AUPRO #1145
Conversation
@jpcbertoldo, can you review this PR please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice proposal, I think this is very useful : )
Although, i'm not sure i understood this
Unlike Torchmetric's ROC or PrecisionRecallCurve, the predictions in AUPRO are not scaled between 0 and 1 when at least 1 prediction is greater than 1. This can create user misunderstanding/misusage when the user is used to provide thresholds between 0 and 1 to match with Torchmetric strategy. In my opinion, this should be updated because to be consistent with Torchmetrics
What I understood looking at torchmetric's code is that if any value in preds
is > 1
or < 0
, then preds = preds.sigmoid()
here, because preds
is assumed to be a logit, which is not the case for all AD models. However thresholds
is not transormed.
So if we don't do the normalization to 0-1 on our side, the new argument is only useful for cases where (indeed) the model gives logits or probabilities, and the user can only provide probabilities. Distance-based (for ex.) models wouldn't take any advantage -- how could you set a meaningful range of distance thresholds, while the distances would be transformed by .sigmoid()
?
Perhaps it'd be nice to cover this? Or at least make this clear in the doc.
This is a tricky topic. Many anomaly models produce distance-based scores which are not confined to a [0, 1] range. Enforcing these predictions to fall within this range by using a sigmoid leads to several problems such as a reduced resolution in the threshold computation (due to the squeezing effect of the sigmoid). In the more extreme case we might actually not be able to obtain a meaningful metric computation due to numerical limitations. Because of this, we asked the TorchMetrics developers to make the sigmoid operation optional in the So, since most anomaly models used unbounded distance-based anomaly scores, it is not a good idea to apply the sigmoid function in our performance metrics. At the same time, this makes it difficult to apply your proposed binned AUPRO metric in practice. We don't know the upper and lower bounds of the model predictions beforehand, which makes it challenging to choose an adequate range of thresholds when instantiating the metric class. A possibility could be to infer the range of threshold values from the stored predictions at the beginning of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your PR! I have some concerns about the usability of this feature, given that anomaly models generally do not generate predictions in the 0-1 range. See my other comment for more details.
I think that a min-max normalization on anomalib's side (therefore torchmetrics Perhaps a warning for extreme cases (e.g. very far away outliers) would be necessary, but the warning logic may become a point of complication. If the goal is to make it binned-able (?), I think we should make sure to apply the same normalization to the thresholds (when the values are given) because otherwise you have to provide thresholds in the normalized range. Or we could just enable the option with |
If we have defective data available we should be able to do min-max normalization. The issue is just that for very big datasets, you no longer have the capacity to store everything in RAM, so we would potentially need to do two passes over the data (or apply some parametric logit function/some safety margins around the scales). |
…amed few variables in the tests
…nto aupro/binning-from-thresholds
Unlike torchmetric PrecisionRecallCurve for instance, the update of this AUPRO metric is only appending on self.target and self.preds and not updating the metric's state. So I guess the RAM issue will happen anyway. That being said, I would say that a mix of @djdameln and @jpcbertoldo proposals could work:
@djdameln @jpcbertoldo Does this solution look like good for you ? |
Thanks @djdameln for the explanation. I guess disabling formatting will indeed be the solution and will allow to move the Anomalib AUPR and AUROC to the new Torchmetric versions more easily and consistently ! |
Not sure I understood your point @ORippler, but if the only option for the
Thanks for the changes @Yann-CV . I think this solution is fine. However, as you said, this won't change the memory consumption because either cases ( Either way I think adding this option is valid but we should aim to properly take advantage of the use case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yann-CV I'm happy with this new approach. Thanks again for your efforts!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice job @Yann-CV : )
@Yann-CV, code quality checks are failing. Can you please run one of the two approaches, whichever you prefer?
pre-commit run --all-files
tox -e pre-commit |
The advantage of the binning is to keep a constant size for the output and to reduce the number of operations during the metric computation. It make it more suitable for big dataset. Unfortunately, this is frustrating but with the current implementation, it seems to slow down the computation. If you want see by yourself, I wrote this script:
And the output on my computer is:
|
@jpcbertoldo I made a small test and I am obtaining these results:
|
@djdameln @jpcbertoldo thanks a lot for the review ;-) |
…nto aupro/binning-from-thresholds
fpr: Tensor = binary_roc( | ||
preds=preds, | ||
target=target, | ||
thresholds=thresholds, | ||
)[ | ||
0 | ||
] # only need fpr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use the roc
function here instead? The reason is that binary_roc
maps the predictions to the [0, 1] range using sigmoid, which is exactly what we're trying to avoid. We asked the TorchMetrics developers to make the sigmoid mapping optional, but until then it would be better if we use the legacy roc
function, which does not remap the predictions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@djdameln in fact roc()
will just call binary_roc()
(the former is just a wrapper for the argument task: str
if task is not None:
if task == "binary":
return binary_roc(preds, target, thresholds, ignore_index, validate_args)
if task == "multiclass":
assert isinstance(num_classes, int)
return multiclass_roc(preds, target, num_classes, thresholds, ignore_index, validate_args)
if task == "multilabel":
assert isinstance(num_labels, int)
return multilabel_roc(preds, target, num_labels, thresholds, ignore_index, validate_args)
raise ValueError(
f"Expected argument `task` to either be `'binary'`, `'multiclass'` or `'multilabel'` but got {task}"
)
else:
rank_zero_warn(
"From v0.10 an `'binary_*'`, `'multiclass_*'`, `'multilabel_*'` version now exist of each classification"
" metric. Moving forward we recommend using these versions. This base metric will still work as it did"
" prior to v0.10 until v0.11. From v0.11 the `task` argument introduced in this metric will be required"
" and the general order of arguments may change, such that this metric will just function as an single"
" entrypoint to calling the three specialized versions.",
DeprecationWarning,
)
preds, target, num_classes, pos_label = _roc_update(preds, target, num_classes, pos_label)
return _roc_compute(preds, target, num_classes, pos_label, sample_weights)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, the roc call with thresholds :
- is requiring the task to be defined. Otherwise the thresholds will just be silently ignored.
- is automatically calling binary roc at the end.
This roc function will be soon deprecated so I would suggest to keep it like this.
Also @djdameln we added a comment just above to mention this issue.
@jpcbertoldo @djdameln any other things needed from myself to merge this pull request ? |
Sorry for the late reply. I don't see anything else. |
@djdameln any last opinion here ? |
@samet-akcay the code quality check failed here after the merge of Let me know if I can help on this ;-) |
@Yann-CV, no worries, this PR is outstanding for too long. We'll sort out the rest! Thanks a lot for your contribution and patience! |
Description
Metrics computation can often be more memory efficient and faster by using predefined thresholds provided as input. This binned computation will be a bit less accurate though, however it can be really useful when evaluating really big datasets.
The current implementation of the AUPRO is not allowing the binned computation, here is a proposal assuming that the target can only contain {0, 1}:
thresholds
attributebinary
(otherwisethresholds
is not usable) and use thethresholds
as argument.Note:
Unlike Torchmetric's ROC or PrecisionRecallCurve, the predictions in AUPRO are not scaled between 0 and 1 when at least 1 prediction is greater than 1. This can create user misunderstanding/misusage when the user is used to provide thresholds between 0 and 1 to match with Torchmetric strategy. In my opinion, this should be updated to be consistent with Torchmetrics
I may be wrong, but It seems that Anomalib AUPR and AUROC can not take advantage of the binned version of Torchmetrics because of the backward compatibility design done in Torchmetrics. Indeed, when we specify the task type, this is creating a new object which is no more the anomalib object inheriting from Torchmetrics mother class. In my opinion, adding binned capability for both these metrics would also require separated pull request. On my side, I moved toward only torchmetrics calls !
Changes
Checklist