Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in 'KD-based Answer Assignment' #6

Open
hegdekartik opened this issue Jun 9, 2023 · 3 comments
Open

Error in 'KD-based Answer Assignment' #6

hegdekartik opened this issue Jun 9, 2023 · 3 comments

Comments

@hegdekartik
Copy link

Hi,

Thanks for the great work. I found your work interesting, so I wanted to try this out. But in 'KD-based Answer Assignment', we are getting errors.

We are getting the following error when we run the following command:

CUDA_VISIBLE_DEVICES=0 python main.py --dataset v2 --mode q_v_debias --debias learned_mixin --topq 1 --topv -1 --qvp 5 --output lmh_css --seed 2048

Traceback (most recent call last):
  File "/mnt/44b643af-38ed-4d24-abcc-00e81b36025c/kartik/KDDAug/main.py", line 178, in <module>
    main()
  File "/mnt/44b643af-38ed-4d24-abcc-00e81b36025c/kartik/KDDAug/main.py", line 175, in main
    train(model, train_loader, eval_loader, args,qid2type)
  File "/mnt/44b643af-38ed-4d24-abcc-00e81b36025c/kartik/KDDAug/train.py", line 219, in train
    word_grad = torch.autograd.grad((pred * (a > 0).float()).sum(), word_emb, create_graph=True)[0]
  File "/home

So we tried the other way given, which is using a pretrained teacher model (CSS) download from CSS-VQA. But unfortunately, after downloading 'model.pth' and running 'Assign new answer' command we got error as below.

CUDA_VISIBLE_DEVICES=0 python assign_answer.py --dataset v2 --name number --split high
DATASET LEN 443757
100%|███████████████████████████████████████████████████| 443757/443757 [00:00<00:00, 946121.94it/s]
100%|███████████████████████████████████████████████████| 443757/443757 [00:02<00:00, 167279.98it/s]
Get language bias, which is an input of CSS teacher model.
loading dictionary from data/dictionary.pkl
tokenize: 100%|██████████████████████████████████████████| 443757/443757 [00:04<00:00, 97819.23it/s]
tensorize: 100%|████████████████████████████████████████| 443757/443757 [00:04<00:00, 109012.99it/s]
Load model from: ./logs/lmh_css/model.pth
Traceback (most recent call last):
  File "/mnt/44b643af-38ed-4d24-abcc-00e81b36025c/kartik/KDDAug/assign_answer.py", line 171, in <module>
    ood_model.load_state_dict(model_state)
  File "/home/kartik/.conda/envs/BLIP_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1667, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for BaseModel:
        size mismatch for classifier.main.3.bias: copying a param with shape torch.Size([2274]) from checkpoint, the shape in current model is torch.Size([2410]).
        size mismatch for classifier.main.3.weight_v: copying a param with shape torch.Size([2274, 2048]) from checkpoint, the shape in current model is torch.Size([2410, 2048]).

How can I get rid of this error?

Thank you

@ItemZheng
Copy link
Owner

For the first error, can you provide more error logs? For the second error, you were missing the argument --teacher_path, and the entire command is CUDA_VISIBLE_DEVICES=0 python assign_answer.py --dataset [cpv2/v2] --name number --split high --teacher_path [] mentioned in README.md.

@hegdekartik
Copy link
Author

For the second error, --teacher_path was an optional argument. So we added the model.pth into the correct folder mentioned in the assign_answer.py, which is './logs/lmh_css/model.pth.

Could you please provide the correct link to the right model.pth for this step?

Error logs for the first error :

Building train dataset...
caching-features: 100%|████████████████████████████████████| 443757/443757 [38:56<00:00, 189.96it/s]
tokenize: 100%|█████████████████████████████████████████| 443757/443757 [00:03<00:00, 119740.31it/s]
tensorize: 100%|████████████████████████████████████████| 443757/443757 [00:04<00:00, 106497.75it/s]
Building test dataset...
caching-features: 100%|████████████████████████████████████| 214354/214354 [18:59<00:00, 188.16it/s]
tokenize: 100%|██████████████████████████████████████████| 214354/214354 [00:04<00:00, 48356.19it/s]
tensorize: 100%|████████████████████████████████████████| 214354/214354 [00:01<00:00, 109298.11it/s]
Starting training...
Epoch 1:   0%|                                                              | 0/867 [00:00<?, ?it/s]/home/kartik/.conda/envs/BLIP_env/lib/python3.10/site-packages/torch/nn/functional.py:1967: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
  warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
Epoch 1:   0%|                                                              | 0/867 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/mnt/44b643af-38ed-4d24-abcc-00e81b36025c/kartik/KDDAug/main.py", line 178, in <module>
    main()
  File "/mnt/44b643af-38ed-4d24-abcc-00e81b36025c/kartik/KDDAug/main.py", line 175, in main
    train(model, train_loader, eval_loader, args,qid2type)
  File "/mnt/44b643af-38ed-4d24-abcc-00e81b36025c/kartik/KDDAug/train.py", line 280, in train
    visual_grad = torch.autograd.grad((pred * (a > 0).float()).sum(), v, create_graph=True)[0]
  File "/home/kartik/.conda/envs/BLIP_env/lib/python3.10/site-packages/torch/autograd/__init__.py", line 300, in grad
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512, 2048]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

@hegdekartik
Copy link
Author

hegdekartik commented Jul 6, 2023

Hi,

I am still having this issue. Can you please check and help me resolve this issue? Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants