Skip to content

Latest commit

 

History

History
351 lines (317 loc) · 9.82 KB

README.md

File metadata and controls

351 lines (317 loc) · 9.82 KB

TextEE

Updates | Datasets | Models | Environment | Running | Results | Website | Paper

Authors: Kuan-Hao Huang, I-Hung Hsu, Tanmay Parekh, Zhiyu Xie, Zixuan Zhang, Premkumar Natarajan, Kai-Wei Chang, Nanyun Peng, Heng Ji

Introduction

TextEE is a standardized, fair, and reproducible benchmark for evaluating event extraction approaches.

  • Standardized data preprocessing for 10+ datasets.
  • Standardized data splits for reducing performance variance.
  • 10+ implemented event extraction approaches published in recent years.
  • Comprehensive reevaluation results for future references.

Please check mroe details our paper TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event Extraction. We will keep adding new datasets and new models!

Updates

  • [04/21/2024] TextEE supports two more datasets: SPEED and MUC-4.
  • [02/23/2024] TextEE supports the CEDAR approach now.
  • [12/26/2023] TextEE supports three more datasets: MLEE, Genia2011, Genia2013.
  • [11/15/2023] We release TextEE, a framework for reevaluation and benchmark for event extraction. Feel free to contact us ([email protected]) if you want to contribute your models or datasets!

Supported Datasets

Dataset Name Task Paper Title Venue
ACE05 E2E, ED, EAE The Automatic Content Extraction (ACE) Program - Tasks, Data, and Evaluation LREC 2004
ERE E2E, ED, EAE From Light to Rich ERE: Annotation of Entities, Relations, and Events EVENTS@NAACL 2015
MLEE E2E, ED, EAE Event extraction across multiple levels of biological organization Bioinformatics 2012
Genia2011 E2E, ED, EAE Overview of Genia Event Task in BioNLP Shared Task 2011 BioNLP Shared Task 2011 Workshop
Genia2013 E2E, ED, EAE The Genia Event Extraction Shared Task, 2013 Edition - Overview BioNLP Shared Task 2013 Workshop
M2E2 E2E, ED, EAE Cross-media Structured Common Space for Multimedia Event Extraction ACL 2020
CASIE E2E, ED, EAE CASIE: Extracting Cybersecurity Event Information from Text AAAI 2020
PHEE E2E, ED, EAE PHEE: A Dataset for Pharmacovigilance Event Extraction from Text EMNLP 2022
MEE ED MEE: A Novel Multilingual Event Extraction Dataset EMNLP 2022
FewEvent ED Meta-Learning with Dynamic-Memory-Based Prototypical Network for Few-Shot Event Detection WSDM 2020
MAVEN ED MAVEN: A Massive General Domain Event Detection Dataset EMNLP 2020
SPPED ED Event Detection from Social Media for Epidemic Prediction NAACL 2024
MUC-4 EAE Fourth Message Understanding Conference MUC-4 1992
RAMS EAE Multi-Sentence Argument Linking ACL 2020
WikiEvents EAE Document-Level Event Argument Extraction by Conditional Generation NAACL 2021
GENEVA EAE GENEVA: Benchmarking Generalizability for Event Argument Extraction with Hundreds of Event Types and Argument Roles ACL 2023

Supported Models

Model Name Task Paper Title Venue
DyGIE++ E2E Entity, Relation, and Event Extraction with Contextualized Span Representations EMNLP 2019
OneIE E2E A Joint Neural Model for Information Extraction with Global Features ACL 2020
AMR-IE E2E Abstract Meaning Representation Guided Graph Encoding and Decoding for Joint Information Extraction NAACL 2021
DEGREE E2E, ED, EAE DEGREE: A Data-Efficient Generation-Based Event Extraction Model NAACL 2022
EEQA ED, EAE Event Extraction by Answering (Almost) Natural Questions EMNLP 2020
RCEE ED, EAE Event Extraction as Machine Reading Comprehension EMNLP 2020
Query&Extract ED, EAE Query and Extract: Refining Event Extraction as Type-oriented Binary Decoding ACL-Findings 2022
TagPrime ED, EAE TAGPRIME: A Unified Framework for Relational Structure Extraction ACL 2023
UniST ED Unified Semantic Typing with Meaningful Label Inference NAACL 2022
CEDAR ED GLEN: General-Purpose Event Detection for Thousands of Types EMNLP 2023
BART-Gen EAE Document-Level Event Argument Extraction by Conditional Generation NAACL 2021
PAIE EAE Prompt for Extraction? PAIE: Prompting Argument Interaction for Event Argument Extraction ACL 2022
X-Gear EAE Multilingual Generative Language Models for Zero-Shot Cross-Lingual Event Argument Extraction ACL 2022
AMPERE EAE AMPERE: AMR-Aware Prefix for Generation-Based Event Argument Extraction Model ACL 2023

Reevaluation Results

Please check here.

Environment

  1. Please install the following packages from both conda and pip.
conda install
  - python 3.8
  - pytorch 2.0.1
  - numpy 1.24.3
  - ipdb 0.13.13
  - tqdm 4.65.0
  - beautifulsoup4 4.11.1
  - lxml 4.9.1
  - jsonlines 3.1.0
  - jsonnet 0.20.0
  - stanza=1.5.0
pip install
  - transformers 4.30.0
  - sentencepiece 0.1.96
  - scipy 1.5.4
  - spacy 3.1.4
  - nltk 3.8.1
  - tensorboardX 2.6
  - keras-preprocessing 1.1.2
  - keras 2.4.3
  - dgl-cu111 0.6.1
  - amrlib 0.7.1
  - cached_property 1.5.2
  - typing-extensions 4.4.0
  - penman==1.2.2

Alternatively, you can use the following command.

conda env create -f env.yml
  1. Run the following command.
python -m spacy download en_core_web_lg

Running

Training

./scripts/train.sh [config]

Evaluation for End-to-End Model

# Evaluating End-to-End
python TextEE/evaluate_end2end.py --task E2E --data [eval_data] --model [saved_model_folder]

# Evaluating EAE
python TextEE/evaluate_end2end.py --task EAE --data [eval_data] --model [saved_model_folder]

Evaluation for Pipeline Model

# Evaluating ED
python TextEE/evaluate_pipeline.py --task ED --data [eval_data] --ed_model [saved_model_folder]

# Evaluating EAE
python TextEE/evaluate_pipeline.py --task EAE --data [eval_data] --eae_model [saved_model_folder]

# Evaluating ED+EAE
python TextEE/evaluate_pipeline.py --task E2E --data [eval_data] --ed_model [saved_model_folder] --eae_model [saved_model_folder]

Making Predictions for New Texts with End-to-End Model

# Predicting End-to-End
python TextEE/predict_end2end.py --input_file demo_input.txt --model [saved_model_folder] --output_file demo_output.json

Making Predictions for New Texts with Pipeline Model

# Predicting ED+EAE
python TextEE/predict_pipeline.py --input_file demo_input.txt --ed_model [saved_model_folder] --eae_model [saved_model_folder] --output_file demo_output.json

Citation

@article{Huang23textee,
  author       = {Kuan{-}Hao Huang and
                  I{-}Hung Hsu and
                  Tanmay Parekh and 
                  Zhiyu Xie and
                  Zixuan Zhang and
                  Premkumar Natarajan and
                  Kai{-}Wei Chang and
                  Nanyun Peng and
                  Heng Ji},
  title        = {TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event Extraction},
  journal      = {arXiv preprint arXiv:2311.09562},
  year         = {2023},
}