You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
For the gradients list in this fuction: https://github.com/jacobgil/vit-explain/blob/main/vit_grad_rollout.py#L9
do we need to reverse the gradients? Since the attention is accumulated in the forward path, while gradient is accumulated in the backward path. To multiply attention with gradient, we need to reverse the gradient: gradients = gradients[::-1].
What do you think? Thanks
The text was updated successfully, but these errors were encountered:
and found that
f 140200126500192
f 140200126548912
f 140200126499136
f 140200206491504
f 140200000463632
f 140200000464592
f 140200000465552
f 140200000466512
f 140200000172624
f 140200000173584
f 140200000174544
b 140200000174544
b 140200000173584
b 140200000172624
b 140200000466512
b 140200000465552
b 140200000464592
b 140200000463632
b 140200206491504
b 140200126499136
b 140200126548912
b 140200126500192
b 140200126500144
Hi,
For the gradients list in this fuction: https://github.com/jacobgil/vit-explain/blob/main/vit_grad_rollout.py#L9
do we need to reverse the gradients? Since the attention is accumulated in the forward path, while gradient is accumulated in the backward path. To multiply attention with gradient, we need to reverse the gradient: gradients = gradients[::-1].
What do you think? Thanks
The text was updated successfully, but these errors were encountered: