[Title] MTAAL: Multi-Task Adversarial Active Learning for Medical Named Entity Recognition and Normalization
[Authors] Baohang Zhou, Xiangrui Cai, Ying Zhang, Wenya Guo, Xiaojie Yuan
- Clone the repo to your local.
- Download Python version: 3.6.5.
- Download the word embeddings from the following websites. Put them into the "pretrain" folder. (Word2Vec and Glove)
- Open the shell or cmd in this repo folder. Run this command to install necessary packages.
pip install -r requirements.txt
- Before running models, you should run this command to handle the dataset. You can choose the parameters to change datasets and word embeddings.
python preprocess.py --dataset=[ncbi, cdr] --wordembedding=[word2vec, glove]
- You can input the following command to run the different active learning models. There are different choices for some parameters shown in []. The meaning of these parameters are shown in the following tables.
Parameters | Value | Description |
---|---|---|
epoch | int | Query times for active learning |
label | float | The split proportion for initial labeled set |
unlabel | float | The split proportion for initial unlabeled set |
test | float | The split proportion for test set |
query_num | int | The number of query samples |
ad_task | str | Choose whether to use Task Adversarial Learning |
task | str | Choose the task to run model. "all" is multi-task scenario |
al | str | Choose the active learning method. |
python main.py params \
--epoch=70 \
--label=0.2 \
--unlabel=0.7 \
--test=0.1 \
--batch_size=32 \
--query_num=64 \
--ad_task=[True, False] \
--dataset=[ncbi, cdr] \
--rnn_units=64 \
--task=[all, ner, nen] \
--gpu=[True, False] \
--al=[diversity, random, lc, entropy, mnlp]
- After running the model, the test result is saved in the "results" folder.
PS: We use the evaluation metrics as described in this paper (Zhao et al.).