Learning to extract API mentions from informal natural language discussions

Deheng Ye, Zhenchang Xing, Chee Yong Foo, Jing Li, Nachiket Kapre

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

44 Citations (Scopus)

Abstract

When discussing programming issues on social platforms (e.g, Stack Overflow, Twitter), developers often mention APIs in natural language texts. Extracting API mentions in natural language texts is a prerequisite for effective indexing and searching for API-related information in software engineering social content. However, the informal nature of social discussions creates two fundamental challenges for API extraction: common-word polysemy and sentence-format variations. Common-word polysemy refers to the ambiguity between the API sense of a common word and the normal sense of the word (e.g., append, apply and merge). Sentence-format variations refer to the lack of consistent sentence writing format for inferring API mentions. Existing API extraction techniques fall short to address these two challenges, because they assume distinct API naming conventions (e.g., camel case, underscore) or structured sentence format (e.g., code-like phrase, API annotation, or full API name). In this paper, we propose a semi-supervised machine-learning approach that exploits name synonyms and rich semantic context of API mentions to extract API mentions in informal social text. The key innovation of our approach is to exploit two complementary unsupervised language models learned from the abundant un-labeled text to model sentence-format variations and to train a robust model with a small set of labeled data and an iterative self-training process. The evaluation of 1,205 API mentions of the three libraries (Pandas, Numpy, and Matplotlib) in Stack Overflow texts shows that our approach significantly outperforms existing API extraction techniques based on language-convention and sentence-format heuristics and our earlier machine-learning based method for named-entity recognition.

Original languageEnglish
Title of host publicationProceedings - 2016 IEEE International Conference on Software Maintenance and Evolution, ICSME 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages389-399
Number of pages11
ISBN (Electronic)9781509038060
DOIs
Publication statusPublished - 12 Jan 2017
Externally publishedYes
Event32nd IEEE International Conference on Software Maintenance and Evolution, ICSME 2016 - Raleigh, United States
Duration: 2 Oct 201610 Oct 2016

Publication series

NameProceedings - 2016 IEEE International Conference on Software Maintenance and Evolution, ICSME 2016

Conference

Conference32nd IEEE International Conference on Software Maintenance and Evolution, ICSME 2016
Country/TerritoryUnited States
CityRaleigh
Period2/10/1610/10/16

Fingerprint

Dive into the research topics of 'Learning to extract API mentions from informal natural language discussions'. Together they form a unique fingerprint.

Cite this