-
Notifications
You must be signed in to change notification settings - Fork 939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat,refactor,fix] Major change: multitasking, See details #173
Conversation
…ommit in description - This PR has breaking changes for API and will break a lot of things before v0.3.1 - Upgrades for 1.2.0 - Add support for MultiTasking, multiple datasets can be trained together now - Add proper version for fastText - Remove concept of tasks, datasets are first class citizens - Update the folder structure to reflect datasets as first class citizens - Fixes for Distributed setup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. Few nits. Please test thoroughly before merging as it has lot of BC.
configs/vqa/vqa2/pythia.yml
Outdated
vqa2: | ||
image_features: | ||
train: | ||
- /private/home/asg/datasets/COCO/detectron_fix_100/fc6/train_val_2014,coco/resnet152/train_val_2014 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove /private/home/asg/
vqa2: | ||
image_features: | ||
train: | ||
- /private/home/asg/datasets/COCO/detectron_fix_100/fc6/train_val_2014,coco/resnet152/train_val_2014 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
self._load_imdb(imdb_path) | ||
|
||
def _load_imdb(self, imdb_path): | ||
if imdb_path.endswith(".npy"): | ||
self._load_npy(imdb_path) | ||
elif imdb_path.endswith(".jsonl"): | ||
self._load_jsonl(imdb_path) | ||
elif imdb_path.contains("visdial") or imdb_path.contains("visual_dialog"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between visdial
and visual_dialog
?
pythia/trainers/base_trainer.py
Outdated
@@ -57,7 +58,7 @@ def _init_process_group(self): | |||
raise RuntimeError( | |||
"Unable to initialize process group: NCCL is not available" | |||
) | |||
torch.distributed.init_process_group(backend="nccl") | |||
torch.distributed.init_process_group(backend="nccl", init_method="env://") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
init_method
default value is env://
. Do we need to specify explicitly?
pythia/datasets/multi_dataset.py
Outdated
import numpy as np | ||
from torch.utils.data import Dataset | ||
from torch.utils.data import DataLoader | ||
# from torch.utils.data.distributed import DistributedSampler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove
* [feat,refactor,bug] Major change: multitasking, See details for the commit in description - This PR has breaking changes for API and will break a lot of things before v0.3.1 - Upgrades for 1.2.0 - Add support for MultiTasking, multiple datasets can be trained together now - Add proper version for fastText - Remove concept of tasks, datasets are first class citizens - Update the folder structure to reflect datasets as first class citizens - Fixes for Distributed setup * [fix] Remove import for single dataset * [fix] Fix metrics name in configs * [fix] Fix values for tests due to changes in PyTorch 1.2 * [fix] Remove print statement in test * [fix] Address comments * [fix] Pythia train and val configuration * [fix] TextVQA config * Update multi_dataset.py * Update base_trainer.py
* [feat,refactor,bug] Major change: multitasking, See details for the commit in description - This PR has breaking changes for API and will break a lot of things before v0.3.1 - Upgrades for 1.2.0 - Add support for MultiTasking, multiple datasets can be trained together now - Add proper version for fastText - Remove concept of tasks, datasets are first class citizens - Update the folder structure to reflect datasets as first class citizens - Fixes for Distributed setup * [fix] Remove import for single dataset * [fix] Fix metrics name in configs * [fix] Fix values for tests due to changes in PyTorch 1.2 * [fix] Remove print statement in test * [fix] Address comments * [fix] Pythia train and val configuration * [fix] TextVQA config * Update multi_dataset.py * Update base_trainer.py
Summary: Pull Request resolved: fairinternal/mmf-internal#173 * adds a feature in download.py to allow it to work for manifold * onboards clip_processor to work with all three config below: * local on disk file * manifold file * http file Reviewed By: vedanuj Differential Revision: D27760358 fbshipit-source-id: 1b7b8eff09a21e8afc48971d69d46df18a8ced6b
before v0.3.1
together now
citizens