This is a course project for Web Data Mining. The task is to decide whether a sentence contains the answer to the questions. We use ESIM (Enhanced LSTM for Natural Language Inference) as our main model. Pretrained Chinese character embedding is adopted to faciliate character-level matching between questions and answers. We employ focal loss to address the unbalanced label. A PowerPoint slide is attached in which we further explain our method.
- Python (>= 3.6)
- PyTorch (>= 1.0)
- torchtext
The dataset for this project is NLPCC DBQA 2016.
MAP | MRR | |
---|---|---|
All-0 | 25.30 | 25.81 |
BERT | 93.73 | 93.83 |
Ours | 90.33 | 90.48 |