TY - JOUR
T1 - Active knowledge graph completion
AU - Omran, Pouya Ghiasnezhad
AU - Taylor, Kerry
AU - Mendez, Sergio Rodriguez
AU - Haller, Armin
N1 - Publisher Copyright:
© 2022
PY - 2022/8
Y1 - 2022/8
N2 - Enterprise and public Knowledge Graphs (KGs) are known to be incomplete. Methods for automatic completion, sometimes by rule learning, scale well. While previous rule-based methods learn closed (non-existential) rules, we introduce Open Path (OP) rules that are constrained existential rules. We present a novel algorithm, OPRL, for learning OP rules. Closed rules complete a KG by answering queries of unclear origin, usually derived from a holdback test set in experimental settings. However, OP rules can generate relevant queries for KG completion. OPRL generates queries even when there is no closed rule to answer the query, or when the correct answer is a missing entity that is not present in the KG. For OPRL to scale well, we propose a novel embedding-based fitness function to efficiently estimate rule quality. Additionally, we introduce a novel, efficient vector computation to formally assess rule quality. We evaluate OPRL using adaptations of Freebase, YAGO2, Wikidata, and a synthetic Poker KG. We find that OPRL mines hundreds of accurate rules from massive KGs with up to 8 M facts. The OP rules generate queries with precision as high as 98% and recall of 62% on a complete KG, demonstrating the first solution for active knowledge graph completion.
AB - Enterprise and public Knowledge Graphs (KGs) are known to be incomplete. Methods for automatic completion, sometimes by rule learning, scale well. While previous rule-based methods learn closed (non-existential) rules, we introduce Open Path (OP) rules that are constrained existential rules. We present a novel algorithm, OPRL, for learning OP rules. Closed rules complete a KG by answering queries of unclear origin, usually derived from a holdback test set in experimental settings. However, OP rules can generate relevant queries for KG completion. OPRL generates queries even when there is no closed rule to answer the query, or when the correct answer is a missing entity that is not present in the KG. For OPRL to scale well, we propose a novel embedding-based fitness function to efficiently estimate rule quality. Additionally, we introduce a novel, efficient vector computation to formally assess rule quality. We evaluate OPRL using adaptations of Freebase, YAGO2, Wikidata, and a synthetic Poker KG. We find that OPRL mines hundreds of accurate rules from massive KGs with up to 8 M facts. The OP rules generate queries with precision as high as 98% and recall of 62% on a complete KG, demonstrating the first solution for active knowledge graph completion.
KW - Existential rule learning
KW - Knowledge graph
KW - Knowledge graph completion
KW - Rule learning
UR - http://www.scopus.com/inward/record.url?scp=85130144530&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2022.05.027
DO - 10.1016/j.ins.2022.05.027
M3 - Article
SN - 0020-0255
VL - 604
SP - 267
EP - 279
JO - Information Sciences
JF - Information Sciences
ER -