An Evaluation of State-of-the-Art Approaches to Relation Extraction for Usage on Domain-Specific Corpora

Additional Material for the publication:

An Evaluation of State-of-the-Art Approaches to Relation Extraction for Usage on Domain-Specific Corpora

Christoph Brandl, Jens Albrecht and Renato Budinich

This publication was created as part of the research group Future Engineering.

Manually Labelled Future Engineering Data & Adapted FewRel Data

The folder 'fe-training-data' contains all available examples from our manually labelled Future Engineering data. They are splitted into training, test and evaluation data files. The data set is based on articles extracted from electrive.com, a news provider targeting decision-makers, manufacturers and service providers in the e-mobility sector.

In addition, the folder 'fewrel-training-data' contains the used training and evaluation data from the FewRel data set, as described in the conference papers.

Implementations of Different Relation Extraction Approaches

This repository contains different approaches for the Relation Extraction task from text. At the moment the repository contains working implementations of the following approaches :

Entity-aware BLSTM based on this GitHub repository
ERNIE based on this GitHub repository
R-BERT based on this GitHub repository
Matching the Blanks BERT based on the this GitHub repository
BERT Pair based on this GitHub repository

In addition the repository contains a converter for parsing TSV files from the INCEptTION annotation tool transfering them into a data format similar to the format of FewRel data.

Requirements

python == 3.6
torch >= 1.5.0
transformers == 3.0.0
nltk >= 3.2.5
rdflib >= 5.0.0
tagme >= 0.1.3
flair >= 0.6.0
wptools >= 0.4.17
pydotplus >= 2.0.2
graphviz >= 0.10.1
lime >= 0.2.0.1

There is a requiremets.txt file included in the repository for installing all needed libraries in the correct version.
However, note that some of the libraries can not be installed via a requirements file and have to be installed seperately. In particular, PyTorch, Flair and PyCurl.

Installation

In order to use the approaches in this repository some additional files like pretraining checkpoints or additional data sources of the approaches have to be downloaded.

The Matching the Blanks GitHub repository provides a data file for the pre-training process of the BERT model:

Pre-training data for MTB training

The authors of the ERNIE approach provide additional data:

The used data for fine-tuning the approaches to the specific tasks are also provided:

The Entity-aware BLSTM approach uses pre-trained Glove vectors for word representation (the extracted file should be located in a resource folder inside the approaches folder):

GloVe pre-trained word vectors

The dowloaded data can be extracted and moved into the corresponding folder of the approach in the repository.

Usage

Each of the above approaches is included in an own Jupyter notebook. There the approach can be trained on one of the datasets (fine-tuning). At the end of those notebooks all needed information including the trained model weights and additional resources is stored in checkpoint files. This training step is a prerequisite for using the models later for the inference of new sentences in the Text2RelationGraph notebook.

The notebook Text2RelationGraph contains a complete processing from a not annotated text to RDF-Triples building a knowledge graph. Therefore one of the approaches can be chosen dynamically within the notebook. The notebook uses the previously trained and stored information from the approaches individual notebooks.
Additionally an evaluation of all approaches can be done with different datasets. Metrics as accuracy, precision, recall and F1 score are calculated and a confusion matrix is plotted.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
ERNIE/code		ERNIE/code
MTB/code		MTB/code
R-BERT/code		R-BERT/code
bert-pair/code		bert-pair/code
entity-aware-relation-classification/code		entity-aware-relation-classification/code
fe-training-data		fe-training-data
fewrel-training-data		fewrel-training-data
inception_tsv_parser		inception_tsv_parser
.gitignore		.gitignore
BERT PAIR Relation Extraction.ipynb		BERT PAIR Relation Extraction.ipynb
BERT_MTB Relation Extraction.ipynb		BERT_MTB Relation Extraction.ipynb
ERNIE Relation Extraction.ipynb		ERNIE Relation Extraction.ipynb
Entity-aware BLSTM.ipynb		Entity-aware BLSTM.ipynb
LICENSE		LICENSE
R-BERT.ipynb		R-BERT.ipynb
README.md		README.md
Text2RelationGraph.ipynb		Text2RelationGraph.ipynb
Trainingdata Evaluation & Generation.ipynb		Trainingdata Evaluation & Generation.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

An Evaluation of State-of-the-Art Approaches to Relation Extraction for Usage on Domain-Specific Corpora

Christoph Brandl, Jens Albrecht and Renato Budinich

Manually Labelled Future Engineering Data & Adapted FewRel Data

Implementations of Different Relation Extraction Approaches

Requirements

Installation

Usage

About

Releases

Packages

Contributors 2

Languages

License

th-nuernberg/fe-relation-extraction-natl21

Folders and files

Latest commit

History

Repository files navigation

An Evaluation of State-of-the-Art Approaches to Relation Extraction for Usage on Domain-Specific Corpora

Christoph Brandl, Jens Albrecht and Renato Budinich

Manually Labelled Future Engineering Data & Adapted FewRel Data

Implementations of Different Relation Extraction Approaches

Requirements

Installation

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages