Software-specific named entity recognition in software engineering social content

Deheng Ye, Zhenchang Xing, Chee Yong Foo, Zi Qun Ang, Jing Li, Nachiket Kapre

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

82 Citations (Scopus)

Abstract

Software engineering social content, such as Q&A discussions on Stack Overflow, has become a wealth of information on software engineering. This textual content is centered around software-specific entities, and their usage patterns, issues-solutions, and alternatives. However, existing approaches to analyzing software engineering texts treat software-specific entities in the same way as other content, and thus cannot support the recent advance of entity-centric applications, such as direct answers and knowledge graph. The first step towards enabling these entity-centric applications for software engineering is to recognize and classify software-specific entities, which is referred to as Named Entity Recognition (NER) in the literature. Existing NER methods are designed for recognizing person, location and organization in formal and social texts, which are not applicable to NER in software engineering. Existing information extraction methods for software engineering are limited to API identification and linking of a particular programming language. In this paper, we formulate the research problem of NER in software engineering. We identify the challenges in designing a software-specific NER system and propose a machine learning based approach applied on software engineering social content. Our NER system, called S-NER, is general for software engineering in that it can recognize a broad category of software entities for a wide range of popular programming languages, platform, and library. We conduct systematic experiments to evaluate our machine learning based S-NER against a well-designed rule-based baseline system, and to study the effectiveness of widely-adopted NER techniques and features in the face of the unique characteristics of software engineering social content.

Original languageEnglish
Title of host publication2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering, SANER 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages90-101
Number of pages12
ISBN (Electronic)9781509018550
DOIs
Publication statusPublished - 20 May 2016
Externally publishedYes
Event23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering, SANER 2016 - Suita, Osaka, Japan
Duration: 14 Mar 201618 Mar 2016

Publication series

Name2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering, SANER 2016
Volume1

Conference

Conference23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering, SANER 2016
Country/TerritoryJapan
CitySuita, Osaka
Period14/03/1618/03/16

Fingerprint

Dive into the research topics of 'Software-specific named entity recognition in software engineering social content'. Together they form a unique fingerprint.

Cite this