Skip to content

Latest commit

 

History

History
102 lines (89 loc) · 10.1 KB

Open Information Extraction.md

File metadata and controls

102 lines (89 loc) · 10.1 KB

General Information Extraction

General Papers

  1. Improving Information Extraction from Visually Rich Documents using Visual Span Representations [Paper] (VLDB 2021) 🌟
  2. Bootstrapping Information Extraction via Conceptualization (ICDE 2021) 🌟

Open Information Extraction

Sides, Tutorials and Surveys

  1. Brief Introduction and Review of Open Information Extraction System [Slides]
  2. A Survey on Open Information Extraction [Paper]
  3. Open Information Extraction on Scientific Text: An Evaluation [Paper]
  4. Open Information Extraction (OIE) Resources Summary [Paper]

OpenIE Tools or Works

  1. Open Information Extraction from the Web (TextRunner, IJCAI 2007)
  • Incoherent Extractions
  • Uninformative Extractions
  1. MinIE: Minimizing Facts in Open Information Extraction (MinIE, EMNLP 2017) [Code (java)] [Code (python)]
  • Represent information about polarity, modality, attribution and quantities with semantic annotations (instead of actual extraction)
  • identify and remove parts that are considered over specific
  1. Facts that Matter (SALIE, EMNLP 2018) [Code]
  • Extract salient facts, which fulfil two requirements: (1) relevance and (2) diversity
  1. Identifying Relations for Open Information Extraction (ReVerb, EMNLP 2011) [Paper][Code][Homepage]
  • Use syntactic constraints to specify relation phrases (3 simple patterns). Find longest phrase matching one of the syntactic constraints.
  • Find nearest noun-phrases to the left and right of relation phrase. - Not a relative pronoun or WHO-adverb or an existential there.
  • To avoid "over-specified" relation phrases, a relation phrase must have many distinct args in a large corpus
  1. ClausIE: Clause-Based Open Information Extraction (ClausIE, WWW 2013) [Paper][Code (Python)][Code (Java)]
  • Map the dependency relations of an input sentence to clause constituents.
  • A set of coherent clauses presenting a simple linguistic structure is derived from the input
  1. CycleOIE: A Low-Resource Training Framework For Open Information Extraction (COLING 2025) [Paper]

Open Relation Extraction (ORE)

  1. LOREM: Language-consistent Open Relation Extraction from Unstructured Text (WWW 2020)
  2. Topic-Oriented Open Relation Extraction with A Priori Seed Generation (EMNLP 2024) [Paper] 🔥

PriORE leverages the built-in knowledge of LLMs to maintain a dynamic seed relation dictionary for the topic.

Canonicalization of Open Knowledge Bases, OpenIE Triple Clustering

General Papers

  1. Query-Driven On-The-Fly Knowledge Base Construction (QKBfly, VLDB2017) relation clustering based on the PATTY dictionary 🌟
  2. CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information (CESI, WWW 2018) Code triple
  3. Canonicalizing Open Knowledge Bases (CIKM 2014) triple 🌟
  4. Towards Practical Open Knowledge Base Canonicalization (FAC, CIKM 2018) triple 🌟
  5. Identifying Relations for Open Information Extraction (ReVerb, EMNLP 2011) [Paper][Code][Homepage] relation
  • Mophological Normalization
  1. Open Information Extraction to KBP Relations in 3 Hours (TAC. 2013) [Paper]
  • Main idea: relation phrases mapping to KB otology
  • Manually define a set of rules for each relation, to conduct the mapping
  • The motivation and error analysis are well written
  1. ClusType: Effective Entity Recognition and Typing by Relation Phrase-Based Clustering (ClusType, KDD 2015) 🌟
  • Relation Clustering: Two relation phrases tend to have similar cluster membershipd, if they have similar (1) strings; (2) context words; and (3) left and right argument type indicators
  1. Unsupervised Methods for Determining Object and Relation Synonyms on the Web (Resolover, JAIR 2009) relation
  2. Relation Extraction with Matrix Fatorization and Universal Schemes (NAACL-HLT 2013) [Paper]
  • Close to relation clustering
  • Create a universal scheme by unioning surface form predicates from Open IE and relations in the schemas of pre-existing databases
  1. Canonicalization of Open Knowledge Bases with Side Information from the Source Text (ICDE 2018) [Paper] 🌟
  2. Canonicalizing Open Knowledge Bases with Multi-Layered Meta-Graph Neural Network (2020)[Paper]
  3. Joint Entity and Relation Canonicalization in Open Knowledge Graphs using Variational Autoencoders (2020) [Paper]
  4. CaSIE: Canonicalize and Informative Selection of the OpenIE System (ICDE 2021, short) 🌟
  5. Joint Entity and Relation Linking with Coherence Relaxation (SIGMOD 2021) 🌟 [Paper]
  6. Multi-level feature interaction for open knowledge base canonicalization (Knowledge-Based Systems 2024) [Paper]
  7. Jointly Canonicalizing and Linking Open Knowledge Base via Unified Embedding Learning (WWW 2024) [Paper]
  8. Enhancing Domain-Independent Knowledge Graph Construction through OpenIE Cleaning and LLMs Validation (International Conference on Knowledge-Based and Intelligent Information & Engineering Systems, 2024) [Paper] 🔥

Relation Phrases Clustering (finding synonymous phrases and hypernyms)

  1. HARPY: Hypernyms and Alignment of Relational Paraphrases (HAPPY, COLING 2014) [Paper}{Data]
  2. POLY: Mining Relational Paraphrases from Multilingual Sentences (POLY, EMNLP 2016) [Paper][Data]
  • Make use of another language
  1. RELLY: Inferring Hypernym Relationships Between Relational Phrases (REELY, EMNLP 2015) [Paper}[Data]
  2. PATTY: A Taxonomy of Relational Patterns with Semantic Types (PATTY, EMNLP 2012) [Paper][Data]
  3. Discovering and Exploring Relations on the Web (PATTY demo, VLDB 2012) [Paper] 🌟
  4. Ensemble Semantics for Large-Scale Unsupervised Relation Extraction (WEBRE, EMNLP-CoNELL 2012) relation
  5. Relation Schema Induction using Tensor Factorization with Side Information (SICTF, EMNLP 2016) relation schema induction (for building domain-specific kb from unstructured text) [Code]
  6. Constrained Information-Theoretic Tripartite Graph Clustering to Identify Semantically Similar Relations (IJCAI 2015)

Other Canonicalization

  1. Constructing Explainable Opinion Graphs from Reviews (WWW 2021) [Paper]
  • Canicaliza opinion phrases

Related Works

  1. Query-Efficient Correlation Clustering (WWW 2020)

Clustering Methods Used for Canonicalization Note: So far most of the papers I read employ HAC for canonicalization for two major reasons: (1) no predefined number of clusters, (2) Un-sensitive to similarity metrics. The standard algorithm for hierarchical agglomerative clustering (HAC) has a time complexity of O(n^3) and requires O(n^2) memory.

  1. An efficient algorithm for a complete link method. Comput. J. 20, 4 (1977), complete linkage, O(n^2)
  2. A Hierarchical Algorithm for Extreme Clustering (KDD 2017), approximate Hierarchical clustering algorithms at the cost of some loss in performance.
  3. SLINK: an optimally efficient algorithm for the single-link cluster method, O(n^2)

Benchmarks

  1. BenchIE^FL: A Manually Re-Annotated Fact-Based Open Information Extraction Benchmark (ACL 2024) [Paper]

Other Interesting Works about Open IE

  1. Intergring Local Context and Global Cohesiveness for Open Information Extraction (ReMine, WSDM 2019)
  • Solving a joint optimization problem to unify (1) segmenting entity/relation phrases in individual sentences based on local context; and (2) measuring the quality of tuples extracted from individual sentences with a translating-based objective.
  1. Extracting Knowledge from Web Text with Monte Carlo Tree Search (WWW 2020, short paper)
  2. OpenKI: Integrating Open Information Extraction and Knowledge Bases with Relation Inference (NAACL 2019)
  3. On Aligning OpenIE Extractions with Knowledge Bases: A Case Study (ACL 2020) [Paper]
  4. Syntactic and Semantic-driven Learning for Open Information Extraction (EMNLP 2020) [Paper]

Joint Multimodal Entity-Relation Extraction (JMERE)

Research Papers

  1. CAG: A Consistency-Adaptive Text-Image Alignment Generation for Joint Multimodal Entity-Relation Extraction (CIKM 2024) [Paper]