- Improving Information Extraction from Visually Rich Documents using Visual Span Representations [Paper] (VLDB 2021) 🌟
- Bootstrapping Information Extraction via Conceptualization (ICDE 2021) 🌟
- Brief Introduction and Review of Open Information Extraction System [Slides]
- A Survey on Open Information Extraction [Paper]
- Open Information Extraction on Scientific Text: An Evaluation [Paper]
- Open Information Extraction (OIE) Resources Summary [Paper]
- Open Information Extraction from the Web (TextRunner, IJCAI 2007)
- Incoherent Extractions
- Uninformative Extractions
- MinIE: Minimizing Facts in Open Information Extraction (MinIE, EMNLP 2017) [Code (java)] [Code (python)]
- Represent information about polarity, modality, attribution and quantities with semantic annotations (instead of actual extraction)
- identify and remove parts that are considered over specific
- Facts that Matter (SALIE, EMNLP 2018) [Code]
- Extract salient facts, which fulfil two requirements: (1) relevance and (2) diversity
- Use syntactic constraints to specify relation phrases (3 simple patterns). Find longest phrase matching one of the syntactic constraints.
- Find nearest noun-phrases to the left and right of relation phrase. - Not a relative pronoun or WHO-adverb or an existential there.
- To avoid "over-specified" relation phrases, a relation phrase must have many distinct args in a large corpus
- ClausIE: Clause-Based Open Information Extraction (ClausIE, WWW 2013) [Paper][Code (Python)][Code (Java)]
- Map the dependency relations of an input sentence to clause constituents.
- A set of coherent clauses presenting a simple linguistic structure is derived from the input
- CycleOIE: A Low-Resource Training Framework For Open Information Extraction (COLING 2025) [Paper]
Open Relation Extraction (ORE)
- LOREM: Language-consistent Open Relation Extraction from Unstructured Text (WWW 2020)
- Topic-Oriented Open Relation Extraction with A Priori Seed Generation (EMNLP 2024) [Paper] 🔥
PriORE leverages the built-in knowledge of LLMs to maintain a dynamic seed relation dictionary for the topic.
General Papers
- Query-Driven On-The-Fly Knowledge Base Construction (QKBfly, VLDB2017) relation clustering based on the PATTY dictionary 🌟
- CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information (CESI, WWW 2018) Code triple
- Canonicalizing Open Knowledge Bases (CIKM 2014) triple 🌟
- Towards Practical Open Knowledge Base Canonicalization (FAC, CIKM 2018) triple 🌟
- Identifying Relations for Open Information Extraction (ReVerb, EMNLP 2011) [Paper][Code][Homepage] relation
- Mophological Normalization
- Open Information Extraction to KBP Relations in 3 Hours (TAC. 2013) [Paper]
- Main idea: relation phrases mapping to KB otology
- Manually define a set of rules for each relation, to conduct the mapping
- The motivation and error analysis are well written
- ClusType: Effective Entity Recognition and Typing by Relation Phrase-Based Clustering (ClusType, KDD 2015) 🌟
- Relation Clustering: Two relation phrases tend to have similar cluster membershipd, if they have similar (1) strings; (2) context words; and (3) left and right argument type indicators
- Unsupervised Methods for Determining Object and Relation Synonyms on the Web (Resolover, JAIR 2009) relation
- Relation Extraction with Matrix Fatorization and Universal Schemes (NAACL-HLT 2013) [Paper]
- Close to relation clustering
- Create a universal scheme by unioning surface form predicates from Open IE and relations in the schemas of pre-existing databases
- Canonicalization of Open Knowledge Bases with Side Information from the Source Text (ICDE 2018) [Paper] 🌟
- Canonicalizing Open Knowledge Bases with Multi-Layered Meta-Graph Neural Network (2020)[Paper]
- Joint Entity and Relation Canonicalization in Open Knowledge Graphs using Variational Autoencoders (2020) [Paper]
- CaSIE: Canonicalize and Informative Selection of the OpenIE System (ICDE 2021, short) 🌟
- Joint Entity and Relation Linking with Coherence Relaxation (SIGMOD 2021) 🌟 [Paper]
- Multi-level feature interaction for open knowledge base canonicalization (Knowledge-Based Systems 2024) [Paper]
- Jointly Canonicalizing and Linking Open Knowledge Base via Unified Embedding Learning (WWW 2024) [Paper]
- Enhancing Domain-Independent Knowledge Graph Construction through OpenIE Cleaning and LLMs Validation (International Conference on Knowledge-Based and Intelligent Information & Engineering Systems, 2024) [Paper] 🔥
Relation Phrases Clustering (finding synonymous phrases and hypernyms)
- HARPY: Hypernyms and Alignment of Relational Paraphrases (HAPPY, COLING 2014) [Paper}{Data]
- POLY: Mining Relational Paraphrases from Multilingual Sentences (POLY, EMNLP 2016) [Paper][Data]
- Make use of another language
- RELLY: Inferring Hypernym Relationships Between Relational Phrases (REELY, EMNLP 2015) [Paper}[Data]
- PATTY: A Taxonomy of Relational Patterns with Semantic Types (PATTY, EMNLP 2012) [Paper][Data]
- Discovering and Exploring Relations on the Web (PATTY demo, VLDB 2012) [Paper] 🌟
- Ensemble Semantics for Large-Scale Unsupervised Relation Extraction (WEBRE, EMNLP-CoNELL 2012) relation
- Relation Schema Induction using Tensor Factorization with Side Information (SICTF, EMNLP 2016) relation schema induction (for building domain-specific kb from unstructured text) [Code]
- Constrained Information-Theoretic Tripartite Graph Clustering to Identify Semantically Similar Relations (IJCAI 2015)
Other Canonicalization
- Constructing Explainable Opinion Graphs from Reviews (WWW 2021) [Paper]
- Canicaliza opinion phrases
Related Works
- Query-Efficient Correlation Clustering (WWW 2020)
Clustering Methods Used for Canonicalization Note: So far most of the papers I read employ HAC for canonicalization for two major reasons: (1) no predefined number of clusters, (2) Un-sensitive to similarity metrics. The standard algorithm for hierarchical agglomerative clustering (HAC) has a time complexity of O(n^3) and requires O(n^2) memory.
- An efficient algorithm for a complete link method. Comput. J. 20, 4 (1977), complete linkage, O(n^2)
- A Hierarchical Algorithm for Extreme Clustering (KDD 2017), approximate Hierarchical clustering algorithms at the cost of some loss in performance.
- SLINK: an optimally efficient algorithm for the single-link cluster method, O(n^2)
Benchmarks
- BenchIE^FL: A Manually Re-Annotated Fact-Based Open Information Extraction Benchmark (ACL 2024) [Paper]
- Intergring Local Context and Global Cohesiveness for Open Information Extraction (ReMine, WSDM 2019)
- Solving a joint optimization problem to unify (1) segmenting entity/relation phrases in individual sentences based on local context; and (2) measuring the quality of tuples extracted from individual sentences with a translating-based objective.
- Extracting Knowledge from Web Text with Monte Carlo Tree Search (WWW 2020, short paper)
- OpenKI: Integrating Open Information Extraction and Knowledge Bases with Relation Inference (NAACL 2019)
- On Aligning OpenIE Extractions with Knowledge Bases: A Case Study (ACL 2020) [Paper]
- Syntactic and Semantic-driven Learning for Open Information Extraction (EMNLP 2020) [Paper]
- CAG: A Consistency-Adaptive Text-Image Alignment Generation for Joint Multimodal Entity-Relation Extraction (CIKM 2024) [Paper]