Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gradients in Grad Rollout #6

Open
ilovecv opened this issue May 20, 2021 · 1 comment
Open

Gradients in Grad Rollout #6

ilovecv opened this issue May 20, 2021 · 1 comment

Comments

@ilovecv
Copy link

ilovecv commented May 20, 2021

Hi,
For the gradients list in this fuction: https://github.com/jacobgil/vit-explain/blob/main/vit_grad_rollout.py#L9
do we need to reverse the gradients? Since the attention is accumulated in the forward path, while gradient is accumulated in the backward path. To multiply attention with gradient, we need to reverse the gradient: gradients = gradients[::-1].

What do you think? Thanks

@ymp5078
Copy link

ymp5078 commented Apr 5, 2022

I agree with @ilovecv

I check the address of the module using the following code and found that the order is reversed. Please let me know if you have any thoughts.

`def get_attention(self, module, input, output):
print('f',id(module))
self.attentions.append(output.cpu())

def get_attention_gradient(self, module, grad_input, grad_output):
    print('b',id(module))
    self.attention_gradients.append(grad_input[0].cpu())`

and found that
f 140200126500192
f 140200126548912
f 140200126499136
f 140200206491504
f 140200000463632
f 140200000464592
f 140200000465552
f 140200000466512
f 140200000172624
f 140200000173584
f 140200000174544
b 140200000174544
b 140200000173584
b 140200000172624
b 140200000466512
b 140200000465552
b 140200000464592
b 140200000463632
b 140200206491504
b 140200126499136
b 140200126548912
b 140200126500192
b 140200126500144

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants