TY - JOUR
T1 - Towards the derivation of verbal content relations from patent claims using deep syntactic structures
AU - Ferraro, Gabriela
AU - Wanner, Leo
PY - 2011/12
Y1 - 2011/12
N2 - Research on the extraction of content relations from text corpora is a high-priority topic in natural language processing. This is not surprising since content relations form the backbone of any ontology, and ontologies are increasingly made use of in knowledge-based applications. However, so far most of the works focus on the detection of a restricted number of prominent verbal relations, including in particular is-a, has-part and cause. Our application, which aims to provide comprehensive, easy-to-understand content representations of complex functional objects described in patent claims, faces the need to derive a large number of content relations that cannot be limited a priori. To cope with this problem, we take advantage of the fact that deep syntactic dependency structures of sentences capture all relevant content relations - although without any abstraction. We implement thus a three-step strategy. First, we parse the claims to retrieve the deep syntactic dependency structures from which we then derive the content relations. Second, we generalize the obtained relations by clustering them according to semantic criteria, with the goal to unite all sufficiently similar relations. Finally, we identify a suitable name for each generalized relation. To keep the scope of the article within reasonable limits and to allow for a comparison with state-of-the-art techniques, we focus on verbal relations.
AB - Research on the extraction of content relations from text corpora is a high-priority topic in natural language processing. This is not surprising since content relations form the backbone of any ontology, and ontologies are increasingly made use of in knowledge-based applications. However, so far most of the works focus on the detection of a restricted number of prominent verbal relations, including in particular is-a, has-part and cause. Our application, which aims to provide comprehensive, easy-to-understand content representations of complex functional objects described in patent claims, faces the need to derive a large number of content relations that cannot be limited a priori. To cope with this problem, we take advantage of the fact that deep syntactic dependency structures of sentences capture all relevant content relations - although without any abstraction. We implement thus a three-step strategy. First, we parse the claims to retrieve the deep syntactic dependency structures from which we then derive the content relations. Second, we generalize the obtained relations by clustering them according to semantic criteria, with the goal to unite all sufficiently similar relations. Finally, we identify a suitable name for each generalized relation. To keep the scope of the article within reasonable limits and to allow for a comparison with state-of-the-art techniques, we focus on verbal relations.
KW - Cluster labeling
KW - Deep dependency parsing
KW - Dependency relation
KW - Relation clustering
KW - Specialized discourse
UR - http://www.scopus.com/inward/record.url?scp=80051470495&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2011.05.014
DO - 10.1016/j.knosys.2011.05.014
M3 - Article
SN - 0950-7051
VL - 24
SP - 1233
EP - 1244
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
IS - 8
ER -