Gradients in Grad Rollout #6

ilovecv · 2021-05-20T17:53:30Z

Hi,
For the gradients list in this fuction: https://github.com/jacobgil/vit-explain/blob/main/vit_grad_rollout.py#L9
do we need to reverse the gradients? Since the attention is accumulated in the forward path, while gradient is accumulated in the backward path. To multiply attention with gradient, we need to reverse the gradient: gradients = gradients[::-1].

What do you think? Thanks

ymp5078 · 2022-04-05T17:29:22Z

I agree with @ilovecv

I check the address of the module using the following code and found that the order is reversed. Please let me know if you have any thoughts.

`def get_attention(self, module, input, output):
print('f',id(module))
self.attentions.append(output.cpu())

def get_attention_gradient(self, module, grad_input, grad_output):
    print('b',id(module))
    self.attention_gradients.append(grad_input[0].cpu())`

and found that
f 140200126500192
f 140200126548912
f 140200126499136
f 140200206491504
f 140200000463632
f 140200000464592
f 140200000465552
f 140200000466512
f 140200000172624
f 140200000173584
f 140200000174544
b 140200000174544
b 140200000173584
b 140200000172624
b 140200000466512
b 140200000465552
b 140200000464592
b 140200000463632
b 140200206491504
b 140200126499136
b 140200126548912
b 140200126500192
b 140200126500144

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradients in Grad Rollout #6

Gradients in Grad Rollout #6

ilovecv commented May 20, 2021

ymp5078 commented Apr 5, 2022

Gradients in Grad Rollout #6

Gradients in Grad Rollout #6

Comments

ilovecv commented May 20, 2021

ymp5078 commented Apr 5, 2022