TY - GEN
T1 - Software-specific named entity recognition in software engineering social content
AU - Ye, Deheng
AU - Xing, Zhenchang
AU - Foo, Chee Yong
AU - Ang, Zi Qun
AU - Li, Jing
AU - Kapre, Nachiket
N1 - Publisher Copyright:
© 2016 IEEE
PY - 2016/5/20
Y1 - 2016/5/20
N2 - Software engineering social content, such as Q&A discussions on Stack Overflow, has become a wealth of information on software engineering. This textual content is centered around software-specific entities, and their usage patterns, issues-solutions, and alternatives. However, existing approaches to analyzing software engineering texts treat software-specific entities in the same way as other content, and thus cannot support the recent advance of entity-centric applications, such as direct answers and knowledge graph. The first step towards enabling these entity-centric applications for software engineering is to recognize and classify software-specific entities, which is referred to as Named Entity Recognition (NER) in the literature. Existing NER methods are designed for recognizing person, location and organization in formal and social texts, which are not applicable to NER in software engineering. Existing information extraction methods for software engineering are limited to API identification and linking of a particular programming language. In this paper, we formulate the research problem of NER in software engineering. We identify the challenges in designing a software-specific NER system and propose a machine learning based approach applied on software engineering social content. Our NER system, called S-NER, is general for software engineering in that it can recognize a broad category of software entities for a wide range of popular programming languages, platform, and library. We conduct systematic experiments to evaluate our machine learning based S-NER against a well-designed rule-based baseline system, and to study the effectiveness of widely-adopted NER techniques and features in the face of the unique characteristics of software engineering social content.
AB - Software engineering social content, such as Q&A discussions on Stack Overflow, has become a wealth of information on software engineering. This textual content is centered around software-specific entities, and their usage patterns, issues-solutions, and alternatives. However, existing approaches to analyzing software engineering texts treat software-specific entities in the same way as other content, and thus cannot support the recent advance of entity-centric applications, such as direct answers and knowledge graph. The first step towards enabling these entity-centric applications for software engineering is to recognize and classify software-specific entities, which is referred to as Named Entity Recognition (NER) in the literature. Existing NER methods are designed for recognizing person, location and organization in formal and social texts, which are not applicable to NER in software engineering. Existing information extraction methods for software engineering are limited to API identification and linking of a particular programming language. In this paper, we formulate the research problem of NER in software engineering. We identify the challenges in designing a software-specific NER system and propose a machine learning based approach applied on software engineering social content. Our NER system, called S-NER, is general for software engineering in that it can recognize a broad category of software entities for a wide range of popular programming languages, platform, and library. We conduct systematic experiments to evaluate our machine learning based S-NER against a well-designed rule-based baseline system, and to study the effectiveness of widely-adopted NER techniques and features in the face of the unique characteristics of software engineering social content.
UR - http://www.scopus.com/inward/record.url?scp=85113855758&partnerID=8YFLogxK
U2 - 10.1109/SANER.2016.10
DO - 10.1109/SANER.2016.10
M3 - Conference contribution
T3 - 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering, SANER 2016
SP - 90
EP - 101
BT - 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering, SANER 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering, SANER 2016
Y2 - 14 March 2016 through 18 March 2016
ER -