This is the instruction on how to reproduce TabIQA experiments on VQAonBD 2023.
-
Preliminary technical report: https://arxiv.org/abs/2303.14935
-
Ranking in Test Leaderboard: https://ilocr.iiit.ac.in/vqabd/leaderboard.html
2023/04/02: the results for NII-TabIQA have been ranked second, achieving a Weighted Categories Average score of 0.901.
git clone https://github.com/phucty/itabqa.git
cd itabqa
conda create -n itabqa python=3.8
conda activate itabqa
pip install poetry
poetry shell
poetry install
git clone https://github.com/phucty/MTL-TabNet.git
Please follow the instruction of MTL-TabNet to install the module
git clone https://github.com/phucty/OmniTab.git
Please follow the instruction of OmniTab to install the tool. You might need to install omnitab in a different conda env and different pytorch version.
Please setup working directory to your setting in itabqa/config.py file:
-
HOME_ROOT
: itabqa project directorye.g,:
/home/phuc/itabqa
-
DATA_ROOT
: store models, and datasete.g,:
/disks/strg16-176/VQAonBD2023/data
-
DATA_VQA
: VQAonBD 2023 datasete.g,
{DATA_ROOT}/vqaondb2023
-
DATA_PUBTAB
: HTML tables infered from table structure extraction,e.g.,
{DATA_ROOT}/TR_output
Please install MTL-TabNet (the table structure extraction module) in your GPU server and download the checkpoint file in here. Run the the following command to gerate HTML tables from document images (you can change the path of input images, the path of outputs and the path of the checkpoint file):
CUDA_VISIBLE_DEVICES=0 python3 -u ./table_recognition/table_inference_VQAonBD2023_inference.py 1 0
You can run the script in multi GPUs (4 GPUs) by the following commands:
CUDA_VISIBLE_DEVICES=0 python3 -u ./table_recognition/table_inference_VQAonBD2023_inference.py 4 0
CUDA_VISIBLE_DEVICES=1 python3 -u ./table_recognition/table_inference_VQAonBD2023_inference.py 4 1
CUDA_VISIBLE_DEVICES=2 python3 -u ./table_recognition/table_inference_VQAonBD2023_inference.py 4 2
CUDA_VISIBLE_DEVICES=3 python3 -u ./table_recognition/table_inference_VQAonBD2023_inference.py 4 3
python run_gen_training_samples.py
Note: We fine-tune OmniTab on 4 A100 40GB. If you have V100 please change per_device_train_batch_size
, and per_device_eval_batch_size
to 6
cd OmniTab
conda activate ominitab
python -m torch.distributed.launch --nproc_per_node=4 run.py \
--do_train \
--train_file /disks/strg16-176/VQAonBD2023/data/train_all_rawjson \
--validation_file /disks/strg16-176/VQAonBD2023/data/train_100_raw.json \
--model_name_or_path neulab/omnitab-large \
--output_dir /disks/strg16-176/VQAonBD2023/models/omnitab-large-finetuned-qa-all-raw \
--max_source_length 1024 \
--max_target_length 128 \
--val_max_target_length 128 \
--per_device_train_batch_size 12 \
--gradient_accumulation_steps 2 \
--per_device_eval_batch_size 12 \
--num_train_epochs 50.0 \
--warmup_ratio 0.1 \
--learning_rate 2e-5 \
--fp16 \
--logging_steps 100 \
--eval_steps 1000000 \
--save_steps 50000 \
--evaluation_strategy steps \
--predict_with_generate \
--num_beams 5 \
--generation_max_length 128 \
--overwrite_output_dir
The qa models are in itabqa/qa.py.
After fine-tuning, like previous example, the model is in /disks/strg16-176/VQAonBD2023/models/omnitab-large-finetuned-qa-all-raw
.
The pretrained model is here.
We can run QA inference as
cd ..
python run_qa_inference.py
The answers will be saved in answers/raw_3_all
If you find TabIQA tool useful in your work, and you want to cite our work, please use the following referencee:
@article{nguyen2023tabiqa,
title={TabIQA: Table Questions Answering on Business Document Images},
author={Nguyen, Phuc and Ly, Nam Tuan and Takeda, Hideaki and Takasu, Atsuhiro},
journal={arXiv preprint arXiv:2303.14935},
year={2023}
}