TY - GEN
T1 - TNNT
T2 - 11th ACM International Conference on Knowledge Capture, K-CAP 2021
AU - Seneviratne, Sandaru
AU - Rodríguez Méndez, Sergio J.
AU - Zhang, Xuecheng
AU - Omran, Pouya G.
AU - Taylor, Kerry
AU - Haller, Armin
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/12/2
Y1 - 2021/12/2
N2 - Extraction of categorised named entities from text is a complex task given the availability of a variety of Named Entity Recognition (NER) models and the unstructured information encoded in different source document formats. Processing the documents to extract text, identifying suitable NER models for a task, and obtaining statistical information is important in data analysis to make informed decisions. This paper presents\footnoteThe manuscript follows guidelines to showcase a demonstration that introduces an overview of how the toolkit works: input document set, initial settings, processing, and output set. The input document set is artificial in order to show various toolkit capabilities. TNNT, a toolkit that automates the extraction of categorised named entities from unstructured information encoded in source documents, using diverse state-of-the-art (SOTA) Natural Language Processing (NLP) tools and NER models.TNNT integrates 21 different NER models as part of a Knowledge Graph Construction Pipeline (KGCP) that takes a document set as input and processes it based on the defined settings, applying the selected blocks of NER models to output the results. The toolkit generates all results with an integrated summary of the extracted entities, enabling enhanced data analysis to support the KGCP, and also, to aid further NLP tasks.
AB - Extraction of categorised named entities from text is a complex task given the availability of a variety of Named Entity Recognition (NER) models and the unstructured information encoded in different source document formats. Processing the documents to extract text, identifying suitable NER models for a task, and obtaining statistical information is important in data analysis to make informed decisions. This paper presents\footnoteThe manuscript follows guidelines to showcase a demonstration that introduces an overview of how the toolkit works: input document set, initial settings, processing, and output set. The input document set is artificial in order to show various toolkit capabilities. TNNT, a toolkit that automates the extraction of categorised named entities from unstructured information encoded in source documents, using diverse state-of-the-art (SOTA) Natural Language Processing (NLP) tools and NER models.TNNT integrates 21 different NER models as part of a Knowledge Graph Construction Pipeline (KGCP) that takes a document set as input and processes it based on the defined settings, applying the selected blocks of NER models to output the results. The toolkit generates all results with an integrated summary of the extracted entities, enabling enhanced data analysis to support the KGCP, and also, to aid further NLP tasks.
KW - information extraction
KW - knowledge graph construction pipeline
KW - named entity recognition
KW - natural language processing
UR - http://www.scopus.com/inward/record.url?scp=85120920823&partnerID=8YFLogxK
U2 - 10.1145/3460210.3493550
DO - 10.1145/3460210.3493550
M3 - Conference contribution
T3 - K-CAP 2021 - Proceedings of the 11th Knowledge Capture Conference
SP - 249
EP - 252
BT - K-CAP 2021 - Proceedings of the 11th Knowledge Capture Conference
PB - Association for Computing Machinery (ACM)
Y2 - 2 December 2021 through 3 December 2021
ER -