-
Notifications
You must be signed in to change notification settings - Fork 940
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat] add M4C model for TextVQA #213
Conversation
8423d60
to
e5b58fd
Compare
e5b58fd
to
d5c59f1
Compare
Any plans to fix the build? |
Yes! After finishing integration of captioning experiments, I'm addressing the issues you mentioned offline last week (including the CI errors above), and will update this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure all new files have the required copyright headers.
"wouldnt": "wouldn't", | ||
"wouldnt've": "wouldn't've", | ||
"wouldn'tve": "wouldn't've", | ||
"yall": "y'all", | ||
"yall'll": "y'all'll", | ||
"y'allll": "y'all'll", | ||
"yall'd've": "y'all'd've", | ||
"y'alld've": "y'all'd've", | ||
"y'all'dve": "y'all'd've", | ||
"youd": "you'd", | ||
"youd've": "you'd've", | ||
"you'dve": "you'd've", | ||
"youll": "you'll", | ||
"youre": "you're", | ||
"youve": "you've", | ||
} | ||
|
||
NUMBER_MAP = { | ||
"none": "0", | ||
"zero": "0", | ||
"one": "1", | ||
"two": "2", | ||
"three": "3", | ||
"four": "4", | ||
"five": "5", | ||
"six": "6", | ||
"seven": "7", | ||
"eight": "8", | ||
"nine": "9", | ||
"ten": "10", | ||
} | ||
ARTICLES = ["a", "an", "the"] | ||
PERIOD_STRIP = re.compile("(?!<=\d)(\.)(?!\d)") | ||
COMMA_STRIP = re.compile("(?<=\d)(\,)+(?=\d)") | ||
PUNCTUATIONS = [ | ||
";", | ||
r"/", | ||
"[", | ||
"]", | ||
'"', | ||
"{", | ||
"}", | ||
"(", | ||
")", | ||
"=", | ||
"+", | ||
"\\", | ||
"_", | ||
"-", | ||
">", | ||
"<", | ||
"@", | ||
"`", | ||
",", | ||
"?", | ||
"!", | ||
] | ||
|
||
def __init__(self, *args, **kwargs): | ||
pass | ||
|
||
def word_tokenize(self, word): | ||
word = word.lower() | ||
word = word.replace(",", "").replace("?", "").replace("'s", " 's") | ||
return word.strip() | ||
|
||
def process_punctuation(self, in_text): | ||
out_text = in_text | ||
for p in self.PUNCTUATIONS: | ||
if (p + " " in in_text or " " + p in in_text) or ( | ||
re.search(self.COMMA_STRIP, in_text) is not None | ||
): | ||
out_text = out_text.replace(p, "") | ||
else: | ||
out_text = out_text.replace(p, " ") | ||
out_text = self.PERIOD_STRIP.sub("", out_text, re.UNICODE) | ||
return out_text | ||
|
||
def process_digit_article(self, in_text): | ||
out_text = [] | ||
temp_text = in_text.lower().split() | ||
for word in temp_text: | ||
word = self.NUMBER_MAP.setdefault(word, word) | ||
if word not in self.ARTICLES: | ||
out_text.append(word) | ||
else: | ||
pass | ||
for word_id, word in enumerate(out_text): | ||
if word in self.CONTRACTIONS: | ||
out_text[word_id] = self.CONTRACTIONS[word] | ||
out_text = " ".join(out_text) | ||
return out_text | ||
|
||
def __call__(self, item): | ||
item = self.word_tokenize(item) | ||
item = item.replace("\n", " ").replace("\t", " ").strip() | ||
item = self.process_punctuation(item) | ||
item = self.process_digit_article(item) | ||
return item | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reuse the existing code and remove this duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vedanuj Thanks for the comments! I'll fix it later this week.
ec1b1dc
to
e0e99c9
Compare
@vedanuj thanks for your review!
It's fixed now, by
The header
I found that this is non-trivial to fix. This PR is made against v0.4. However, the Do you have suggestions on how to proceed here? |
e0e99c9
to
5a15b40
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am landing this internally, but it would be great if you can work on this on separate PR.
requests==2.21.0 | ||
fasttext==0.9.1 | ||
fastText |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason that we are not using a specific version here?
nltk==3.4.1 | ||
pytorch-transformers==1.2.0 | ||
editdistance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here?
@@ -0,0 +1,146 @@ | |||
// C implementation of the PHOC respresentation. Converts a string into a PHOC feature vector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should move M4C specific utils to a folder utils/m4c_utils
.
@@ -0,0 +1,50 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should go inside distributed utils.
@@ -0,0 +1,22 @@ | |||
dataset_attributes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Configs currently have a lot of replication and not fully utilizing the power of inheritance in our configuration system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general looks good. I think lot of code/config duplication can be avoided.
# install `vqa-maskrcnn-benchmark` from | ||
# https://github.com/ronghanghu/vqa-maskrcnn-benchmark-m4c | ||
import sys; sys.path.append('/private/home/ronghanghu/workspace/vqa-maskrcnn-benchmark') # NoQA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some specific change done for m4c? If not can the https://gitlab.com/meetshah1995/vqa-maskrcnn-benchmark repo be used here? Because it is already being used in feature extraction scripts pythia provides here pythia/scripts/features/extract_features_vmb.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this needs a separate look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there are specific changes for OCR feature extraction. The major change is to allow RoI-Pooling from an externally-specified bounding box (OCR boxes in our use case) instead of from the Faster R-CNN's own RPN proposal. The new branch (https://github.com/ronghanghu/vqa-maskrcnn-benchmark-m4c) is compatible with pythia/scripts/features/extract_features_vmb.py, so can also land it into https://gitlab.com/meetshah1995/vqa-maskrcnn-benchmark.
@@ -24,3 +24,5 @@ eggs/ | |||
*.egg | |||
.DS_Store | |||
.vscode/* | |||
*.so | |||
*-checkpoint.ipynb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are .ipynb
files generated by any of the scripts added here? If not please remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll remove this change.
The *-checkpoint.ipynb
files are generated by Jupyter Notebook servers automatically (similar to how vim generates .swp files -- but doesn't delete them after Jupyter Notebook is shut down). But these Jupyter Notebooks were added primarily during our internal analyses and I did not add them here.
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images | ||
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images | ||
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images | ||
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images | ||
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images | ||
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images | ||
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images | ||
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images | ||
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images | ||
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images | ||
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images | ||
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images | ||
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images | ||
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images | ||
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images | ||
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images | ||
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images | ||
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images | ||
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images | ||
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images | ||
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images | ||
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images | ||
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images | ||
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images | ||
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images | ||
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images | ||
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images | ||
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images | ||
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images | ||
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images | ||
- open_images/detectron_fix_100/fc6/train,m4c_textvqa_ocr_en_frcn_features/train_images |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behaviour should be configurable from code in future. This looks very messy now.
def _image_transform(image_path): | ||
img = Image.open(image_path) | ||
im = np.array(img).astype(np.float32) | ||
# handle a few corner cases | ||
if im.ndim == 2: # gray => RGB | ||
im = np.tile(im[:, :, None], (1, 1, 3)) | ||
if im.shape[2] > 3: # RGBA => RGB | ||
im = im[:, :, :3] | ||
|
||
im = im[:, :, ::-1] # RGB => BGR | ||
im -= np.array([102.9801, 115.9465, 122.7717]) | ||
im_shape = im.shape | ||
im_size_min = np.min(im_shape[0:2]) | ||
im_size_max = np.max(im_shape[0:2]) | ||
im_scale = float(800) / float(im_size_min) | ||
# Prevent the biggest axis from being more than max_size | ||
if np.round(im_scale * im_size_max) > 1333: | ||
im_scale = float(1333) / float(im_size_max) | ||
im = cv2.resize( | ||
im, | ||
None, | ||
None, | ||
fx=im_scale, | ||
fy=im_scale, | ||
interpolation=cv2.INTER_LINEAR | ||
) | ||
img = torch.from_numpy(im).permute(2, 0, 1) | ||
return img, im_scale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check if code can be reused between these and pythia/scripts/features/extract_features_vmb.py
|
||
|
||
@registry.register_processor("bert_tokenizer") | ||
class BertTokenizerProcessor(BaseProcessor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apsdehal We should check if this is same or different from the BertTokenizer we have internally when merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ours is inside processors/bert
.
@@ -160,7 +160,7 @@ def forward(self, sample_list, model_output, *args, **kwargs): | |||
if loss.dim() == 0: | |||
loss = loss.view(1) | |||
|
|||
key = "{}/{}/{}".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this a bug earlier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I noticed this and fixed it in the rebase. It was already fixed in dev branch,
@@ -42,7 +42,7 @@ def __init__(self, trainer): | |||
|
|||
self.models_foldername = os.path.join(self.ckpt_foldername, "models") | |||
if not os.path.exists(self.models_foldername): | |||
os.makedirs(self.models_foldername) | |||
os.makedirs(self.models_foldername, exist_ok=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not change this? This might have dangerous consequences where we overwrite trained models by mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine. This is an issue in case of distributed training which is already fixed in our dev branch. If you see the if
statement above you will understand. This happens when there is a race condition and one of the jobs have already made the folder, then the whole thing fails. That's why exist_ok=True
needs. exist_ok
doesn't overwrite, it just doesn't throw an error if the folder is already there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, From Oleksii's experience, without this change, the program frequently crashes (over 50% time) in distributed training due to a race condition, since multiple processes are trying to make directories, and there's no lock on these lines.
torch==1.2.0 | ||
torchvision==0.2.2 | ||
tensorboardX==1.2 | ||
torch>=1.2 | ||
torchvision>0.2 | ||
tensorboardX>=1.2 | ||
numpy>=1.14 | ||
tqdm==4.19.9 | ||
tqdm>=4.19 | ||
demjson>=2.2 | ||
torchtext>=0.2 | ||
GitPython>=2.1 | ||
PyYAML>=3.11 | ||
pytest==3.3.2 | ||
pytest==5.2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these changes necessary for this PR? If not please remove it from this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the changes with >=
as they can be breaking in nature. Rest I accepted and then updated with our changes as of now.
@ronghanghu This has landed internally on dev. Please make further PRs on that branch. This should be automatically closed once it lands in master. |
Also, I would suggest you to move to automatic download api instead of making users download everything as per the instructions in your readme. |
Closing as landed internally. |
Closes #213 Merge the M4C model (https://arxiv.org/pdf/1911.06258.pdf) for TextVQA into Pythia. Summary of changes: * Adding `README.md` under `projects/M4C` * Adding new models: M4C under `pythia/models/m4c.py` * Adding new dataset classes: `m4c_textvqa`, `m4c_stvqa`, and `m4c_ocrvqa` under `pythia/datasets/vqa/` * Adding new config files under `configs/vqa` * Adding new processors, metrics and losses for M4C training and evaluation. * Adding other utilities (such as PHOC feature extraction). Introducing new dependencies (added to `requirements.txt`): * `pytorch-transformers` * `editdistance` * R. Hu, A. Singh, T. Darrell, M. Rohrbach, *Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA*. arXiv preprint arXiv:1911.06258, 2019 ([PDF](https://arxiv.org/pdf/1911.06258.pdf)) ``` @Article{hu2019iterative, title={Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA}, author={Hu, Ronghang and Singh, Amanpreet and Darrell, Trevor and Rohrbach, Marcus}, journal={arXiv preprint arXiv:1911.06258}, year={2019} } ``` | Datasets | M4C Vocabs | M4C ImDBs | Object Faster R-CNN Features | OCR Faster R-CNN Features | |--------------|-----|-----|-----------------------------------------------------------------------------------|---------------------------------------------------------------------------------| | TextVQA | [All Vocabs](https://dl.fbaipublicfiles.com/pythia/m4c/data/m4c_vocabs.tar.gz) | [TextVQA ImDB](https://dl.fbaipublicfiles.com/pythia/m4c/data/imdb/m4c_textvqa.tar.gz) | [OpenImages](https://dl.fbaipublicfiles.com/pythia/features/open_images.tar.gz) | [TextVQA Rosetta-en OCRs](https://dl.fbaipublicfiles.com/pythia/m4c/data/m4c_textvqa_ocr_en_frcn_features.tar.gz), [TextVQA Rosetta-ml OCRs](https://dl.fbaipublicfiles.com/pythia/m4c/data/m4c_textvqa_ocr_ml_frcn_features.tar.gz) | | ST-VQA | [All Vocabs](https://dl.fbaipublicfiles.com/pythia/m4c/data/m4c_vocabs.tar.gz) | [ST-VQA ImDB](https://dl.fbaipublicfiles.com/pythia/m4c/data/imdb/m4c_stvqa.tar.gz) | [ST-VQA Objects](https://dl.fbaipublicfiles.com/pythia/m4c/data/m4c_stvqa_obj_frcn_features.tar.gz) | [ST-VQA Rosetta-en OCRs](https://dl.fbaipublicfiles.com/pythia/m4c/data/m4c_stvqa_ocr_en_frcn_features.tar.gz) | | OCR-VQA | [All Vocabs](https://dl.fbaipublicfiles.com/pythia/m4c/data/m4c_vocabs.tar.gz) | [OCR-VQA ImDB](https://dl.fbaipublicfiles.com/pythia/m4c/data/imdb/m4c_ocrvqa.tar.gz) | [OCR-VQA Objects](https://dl.fbaipublicfiles.com/pythia/m4c/data/m4c_ocrvqa_obj_frcn_features.tar.gz) | [OCR-VQA Rosetta-en OCRs](https://dl.fbaipublicfiles.com/pythia/m4c/data/m4c_ocrvqa_ocr_en_frcn_features.tar.gz) | | Datasets | Configs (under `configs/vqa/`) | Pretrained Models | Metrics | Notes | |--------|------------------|----------------------------|-------------------------------|-------------------------------| | TextVQA (`m4c_textvqa`) | `m4c_textvqa/m4c_with_stvqa.yml` | [`download`](https://dl.fbaipublicfiles.com/pythia/m4c/m4c_release_models/m4c_textvqa/m4c_textvqa_m4c_with_stvqa.ckpt) | val accuracy - 40.55%; test accuracy - 40.46% | Rosetta-en OCRs; ST-VQA as additional data | | TextVQA (`m4c_textvqa`) | `m4c_textvqa/m4c.yml` | [`download`](https://dl.fbaipublicfiles.com/pythia/m4c/m4c_release_models/m4c_textvqa/m4c_textvqa_m4c.ckpt) | val accuracy - 39.40%; test accuracy - 39.01% | Rosetta-en OCRs | | TextVQA (`m4c_textvqa`) | `m4c_textvqa/m4c_ocr_ml.yml` | [`download`](https://dl.fbaipublicfiles.com/pythia/m4c/m4c_release_models/m4c_textvqa/m4c_textvqa_m4c_ocr_ml.ckpt) | val accuracy - 37.06% | Rosetta-ml OCRs | | ST-VQA (`m4c_stvqa`) | `m4c_stvqa/m4c.yml` | [`download`](https://dl.fbaipublicfiles.com/pythia/m4c/m4c_release_models/m4c_stvqa/m4c_stvqa_m4c.ckpt) | val ANLS - 0.472 (accuracy - 38.05%); test ANLS - 0.462 | Rosetta-en OCRs | | OCR-VQA (`m4c_ocrvqa`) | `m4c_ocrvqa/m4c.yml` | [`download`](https://dl.fbaipublicfiles.com/pythia/m4c/m4c_release_models/m4c_ocrvqa/m4c_ocrvqa_m4c.ckpt) | val accuracy - 63.52%; test accuracy - 63.87% | Rosetta-en OCRs |
Closes #213 Merge the M4C model (https://arxiv.org/pdf/1911.06258.pdf) for TextVQA into Pythia. Summary of changes: * Adding `README.md` under `projects/M4C` * Adding new models: M4C under `pythia/models/m4c.py` * Adding new dataset classes: `m4c_textvqa`, `m4c_stvqa`, and `m4c_ocrvqa` under `pythia/datasets/vqa/` * Adding new config files under `configs/vqa` * Adding new processors, metrics and losses for M4C training and evaluation. * Adding other utilities (such as PHOC feature extraction). Introducing new dependencies (added to `requirements.txt`): * `pytorch-transformers` * `editdistance` * R. Hu, A. Singh, T. Darrell, M. Rohrbach, *Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA*. arXiv preprint arXiv:1911.06258, 2019 ([PDF](https://arxiv.org/pdf/1911.06258.pdf)) ``` @Article{hu2019iterative, title={Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA}, author={Hu, Ronghang and Singh, Amanpreet and Darrell, Trevor and Rohrbach, Marcus}, journal={arXiv preprint arXiv:1911.06258}, year={2019} } ``` | Datasets | M4C Vocabs | M4C ImDBs | Object Faster R-CNN Features | OCR Faster R-CNN Features | |--------------|-----|-----|-----------------------------------------------------------------------------------|---------------------------------------------------------------------------------| | TextVQA | [All Vocabs](https://dl.fbaipublicfiles.com/pythia/m4c/data/m4c_vocabs.tar.gz) | [TextVQA ImDB](https://dl.fbaipublicfiles.com/pythia/m4c/data/imdb/m4c_textvqa.tar.gz) | [OpenImages](https://dl.fbaipublicfiles.com/pythia/features/open_images.tar.gz) | [TextVQA Rosetta-en OCRs](https://dl.fbaipublicfiles.com/pythia/m4c/data/m4c_textvqa_ocr_en_frcn_features.tar.gz), [TextVQA Rosetta-ml OCRs](https://dl.fbaipublicfiles.com/pythia/m4c/data/m4c_textvqa_ocr_ml_frcn_features.tar.gz) | | ST-VQA | [All Vocabs](https://dl.fbaipublicfiles.com/pythia/m4c/data/m4c_vocabs.tar.gz) | [ST-VQA ImDB](https://dl.fbaipublicfiles.com/pythia/m4c/data/imdb/m4c_stvqa.tar.gz) | [ST-VQA Objects](https://dl.fbaipublicfiles.com/pythia/m4c/data/m4c_stvqa_obj_frcn_features.tar.gz) | [ST-VQA Rosetta-en OCRs](https://dl.fbaipublicfiles.com/pythia/m4c/data/m4c_stvqa_ocr_en_frcn_features.tar.gz) | | OCR-VQA | [All Vocabs](https://dl.fbaipublicfiles.com/pythia/m4c/data/m4c_vocabs.tar.gz) | [OCR-VQA ImDB](https://dl.fbaipublicfiles.com/pythia/m4c/data/imdb/m4c_ocrvqa.tar.gz) | [OCR-VQA Objects](https://dl.fbaipublicfiles.com/pythia/m4c/data/m4c_ocrvqa_obj_frcn_features.tar.gz) | [OCR-VQA Rosetta-en OCRs](https://dl.fbaipublicfiles.com/pythia/m4c/data/m4c_ocrvqa_ocr_en_frcn_features.tar.gz) | | Datasets | Configs (under `configs/vqa/`) | Pretrained Models | Metrics | Notes | |--------|------------------|----------------------------|-------------------------------|-------------------------------| | TextVQA (`m4c_textvqa`) | `m4c_textvqa/m4c_with_stvqa.yml` | [`download`](https://dl.fbaipublicfiles.com/pythia/m4c/m4c_release_models/m4c_textvqa/m4c_textvqa_m4c_with_stvqa.ckpt) | val accuracy - 40.55%; test accuracy - 40.46% | Rosetta-en OCRs; ST-VQA as additional data | | TextVQA (`m4c_textvqa`) | `m4c_textvqa/m4c.yml` | [`download`](https://dl.fbaipublicfiles.com/pythia/m4c/m4c_release_models/m4c_textvqa/m4c_textvqa_m4c.ckpt) | val accuracy - 39.40%; test accuracy - 39.01% | Rosetta-en OCRs | | TextVQA (`m4c_textvqa`) | `m4c_textvqa/m4c_ocr_ml.yml` | [`download`](https://dl.fbaipublicfiles.com/pythia/m4c/m4c_release_models/m4c_textvqa/m4c_textvqa_m4c_ocr_ml.ckpt) | val accuracy - 37.06% | Rosetta-ml OCRs | | ST-VQA (`m4c_stvqa`) | `m4c_stvqa/m4c.yml` | [`download`](https://dl.fbaipublicfiles.com/pythia/m4c/m4c_release_models/m4c_stvqa/m4c_stvqa_m4c.ckpt) | val ANLS - 0.472 (accuracy - 38.05%); test ANLS - 0.462 | Rosetta-en OCRs | | OCR-VQA (`m4c_ocrvqa`) | `m4c_ocrvqa/m4c.yml` | [`download`](https://dl.fbaipublicfiles.com/pythia/m4c/m4c_release_models/m4c_ocrvqa/m4c_ocrvqa_m4c.ckpt) | val accuracy - 63.52%; test accuracy - 63.87% | Rosetta-en OCRs |
Merge the M4C model (https://arxiv.org/pdf/1911.06258.pdf) for TextVQA into Pythia.
Summary of changes:
README.md
underprojects/M4C
pythia/models/m4c.py
m4c_textvqa
,m4c_stvqa
, andm4c_ocrvqa
underpythia/datasets/vqa/
configs/vqa
Introducing new dependencies (added to
requirements.txt
):pytorch-transformers
editdistance
M4C for the TextVQA Task
Vocabs, ImDBs and Features:
Pretrained models:
configs/vqa/
)m4c_textvqa
)m4c_textvqa/m4c_with_stvqa.yml
download
m4c_textvqa
)m4c_textvqa/m4c.yml
download
m4c_textvqa
)m4c_textvqa/m4c_ocr_ml.yml
download
m4c_stvqa
)m4c_stvqa/m4c.yml
download
m4c_ocrvqa
)m4c_ocrvqa/m4c.yml
download