You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
differs from the symmetrized Kullback Leiber divergence, in particular it is not zero when both inputs are equal as would be expected (see https://en.wikipedia.org/wiki/Kullback–Leibler_divergence#Symmetrised_divergence ). In fact it seems to be equal to twice the entropy in that case (when the inputs are equal), which would intuitively lead to predictions of higher confidence.
The way the symmetric KL Loss is implemented here (for sift loss)
DeBERTa/DeBERTa/sift/sift.py
Line 180 in 4d7fe0b
Other implementations, see e.g., https://github.com/archinetai/smart-pytorch/blob/e96d8630dc58e1dce8540f61f91016849925ebfe/smart_pytorch/loss.py#L10, behave more like I would have expected it (from the name). Is there a reason to deviate from the more standard definition?
The text was updated successfully, but these errors were encountered: