Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do I need to download the tremendous weights again? #44

Closed
fanlyu opened this issue Apr 30, 2019 · 6 comments
Closed

Do I need to download the tremendous weights again? #44

fanlyu opened this issue Apr 30, 2019 · 6 comments

Comments

@fanlyu
Copy link

fanlyu commented Apr 30, 2019

I see the updates of the code, so do I need to download the tremendous weights again?

@apsdehal
Copy link
Contributor

Mostly not except for the new model. Features can stay the same but it will require some directory refactor.

@fanlyu
Copy link
Author

fanlyu commented Apr 30, 2019

I try the old version weights that put vqa/train2014 and vqa/val2014 into train_val_2014.
Then I run the code with model pythia as
python tools/run.py --tasks vqa --datasets vqa2 --model pythia --config configs/vqa/vqa2/pythia.yml

But I got the error as below, how to slove it?

2019-04-30T17:16:15 INFO: Starting training...
2019-04-30T17:16:16 ERROR: 'Traceback (most recent call last):\n  File "/home/lvfan/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop\n    samples = collate_fn([dataset[i] for i in batch_indices])\n  File "/home/lvfan/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp>\n    samples = collate_fn([dataset[i] for i in batch_indices])\n  File "/home/lvfan/pythia_new/pythia/tasks/multi_task.py", line 73, in __getitem__\n    item = self.chosen_task[idx]\n  File "/home/lvfan/pythia_new/pythia/tasks/base_task.py", line 154, in __getitem__\n    item = self.chosen_dataset[idx]\n  File "/home/lvfan/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataset.py", line 81, in __getitem__\n    return self.datasets[dataset_idx][sample_idx]\n  File "/home/lvfan/pythia_new/pythia/tasks/base_dataset.py", line 49, in __getitem__\n    sample = self.get_item(idx)\n  File "/home/lvfan/pythia_new/pythia/tasks/vqa/vqa2/dataset.py", line 93, in get_item\n    return self.load_item(idx)\n  File "/home/lvfan/pythia_new/pythia/tasks/vqa/vqa2/dataset.py", line 127, in load_item\n    current_sample = self.add_answer_info(sample_info, current_sample)\n  File "/home/lvfan/pythia_new/pythia/tasks/vqa/vqa2/dataset.py", line 161, in add_answer_info\n    {"answers": answers, "tokens": sample_info["ocr_tokens"]}\nKeyError: \'ocr_tokens\'\n'
Traceback (most recent call last):
  File "tools/run.py", line 87, in <module>
    run()
  File "tools/run.py", line 76, in run
    trainer.train()
  File "/home/lvfan/pythia_new/pythia/common/trainer.py", line 240, in train
    for batch in self.train_loader:
  File "/home/lvfan/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 637, in __next__
    return self._process_next_batch(batch)
  File "/home/lvfan/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
KeyError: 'Traceback (most recent call last):\n  File "/home/lvfan/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop\n    samples = collate_fn([dataset[i] for i in batch_indices])\n  File "/home/lvfan/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp>\n    samples = collate_fn([dataset[i] for i in batch_indices])\n  File "/home/lvfan/pythia_new/pythia/tasks/multi_task.py", line 73, in __getitem__\n    item = self.chosen_task[idx]\n  File "/home/lvfan/pythia_new/pythia/tasks/base_task.py", line 154, in __getitem__\n    item = self.chosen_dataset[idx]\n  File "/home/lvfan/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataset.py", line 81, in __getitem__\n    return self.datasets[dataset_idx][sample_idx]\n  File "/home/lvfan/pythia_new/pythia/tasks/base_dataset.py", line 49, in __getitem__\n    sample = self.get_item(idx)\n  File "/home/lvfan/pythia_new/pythia/tasks/vqa/vqa2/dataset.py", line 93, in get_item\n    return self.load_item(idx)\n  File "/home/lvfan/pythia_new/pythia/tasks/vqa/vqa2/dataset.py", line 127, in load_item\n    current_sample = self.add_answer_info(sample_info, current_sample)\n  File "/home/lvfan/pythia_new/pythia/tasks/vqa/vqa2/dataset.py", line 161, in add_answer_info\n    {"answers": answers, "tokens": sample_info["ocr_tokens"]}\nKeyError: \'ocr_tokens\'\n'

@apsdehal
Copy link
Contributor

That's imdb related issue. It would be better if you download new imdbs, they are not that big. Alternatively, I also pushed a small change that will make sure not to load tokens even for answers when use_ocr is False.

@fanlyu
Copy link
Author

fanlyu commented May 3, 2019

That's imdb related issue. It would be better if you download new imdbs, they are not that big. Alternatively, I also pushed a small change that will make sure not to load tokens even for answers when use_ocr is False.

Hey, I addressed the problem, but find when do

def finalize(self):
        torch.save(self.trainer.model, self.pth_filepath)
        torch.save(self.trainer.model.state_dict(), self.params_filepath)

It got error

2019-05-03T17:18:35 ERROR: can't pickle _thread.lock objects
Traceback (most recent call last):
  File "tools/run.py", line 87, in <module>
    run()
  File "tools/run.py", line 76, in run
    trainer.train()
  File "/home/lvfan/pythia_new/pythia/common/trainer.py", line 260, in train
    self.finalize()
  File "/home/lvfan/pythia_new/pythia/common/trainer.py", line 296, in finalize
    self.checkpoint.finalize()
  File "/home/lvfan/pythia_new/pythia/utils/checkpoint.py", line 243, in finalize
    torch.save(self.trainer.model, self.pth_filepath)
  File "/home/lvfan/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 219, in save
    return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
  File "/home/lvfan/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 144, in _with_file_like
    return body(f)
  File "/home/lvfan/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 219, in <lambda>
    return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
  File "/home/lvfan/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 292, in _save
    pickler.dump(obj)
TypeError: can't pickle _thread.lock objects

That is weird, and result to failure inference.

@apsdehal
Copy link
Contributor

apsdehal commented May 8, 2019

@fanlyu Can you check now, it should be fixed in the master?

@fanlyu
Copy link
Author

fanlyu commented May 9, 2019

@apsdehal It works, thanks! I'd close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants