This repository contains the source code for the NER system presented in the following research publication (link)
Abbas Ghaddar and Philippe Langlais
Robust Lexical Features for Improved Neural Network Named-Entity Recognition
In Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018)
- python 3.6
- tensorflow>=1.6
- pyhocon (for parsing the configurations)
-
Download the data from here and unzip the files in data directory.
-
Change the
raw_path
variables for conll and ontonotes datasets inexperiments.config
file topath/to/conll-2003
andpath/to/conll-2012/v4/data
respectively. For conll dataset please rename eng.train eng.testa eng.testb files to conll.train.txt conll.dev.txt conll.test.txt respectively. -
Run:
$ python preprocess.py {conll|ontonotes}
Once the data preprocessing is completed, you can train and test a model with:
$ python model.py {conll|ontonotes}
The following link contains the model, entity type vocab and code to generate LS embeddings for any word.
Please cite the following paper when using our code:
@InProceedings{ghaddar2018coling,
title={{Robust Lexical Features for Improved Neural Network Named-Entity Recognition}},
author={Ghaddar, Abbas and Langlais, Phillippe},
booktitle = {{COLING} 2018, 27th International Conference on Computational Linguistics},
pages = {1896--1907},
year = {2018}
}