TY - GEN
T1 - Predicting semantically linkable knowledge in developer online forums via convolutional neural network
AU - Xu, Bowen
AU - Ye, Deheng
AU - Xing, Zhenchang
AU - Xia, Xin
AU - Chen, Guibin
AU - Li, Shanping
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/8/25
Y1 - 2016/8/25
N2 - Consider a question and its answers in Stack Overow as a knowledge unit. Knowledge units often contain semantically relevant knowledge, and thus linkable for different purposes, such as duplicate questions, directly linkable for problem solving, indirectly linkable for related information. Recognising different classes of linkable knowledge would support more targeted information needs when users search or explore the knowledge base. Existing methods focus on binary relatedness (i.e., related or not), and are not robust to recognize different classes of semantic relatedness when linkable knowledge units share few words in common (i.e., have lexical gap). In this paper, we formulate the problem of predicting semantically linkable knowledge units as a multiclass classification problem, and solve the problem using deep learning techniques. To overcome the lexical gap issue, we adopt neural language model (word embeddings) and convolutional neural network (CNN) to capture wordand document-level semantics of knowledge units. Instead of using human-engineered classifier features which are hard to design for informal user-generated content, we exploit large amounts of different types of user-created knowledge-unit links to train the CNN to learn the most informative wordlevel and document-level features for the multiclass classification task. Our evaluation shows that our deep-learning based approach significantly and consistently outperforms traditional methods using traditional word representations and human-engineered classifier features.
AB - Consider a question and its answers in Stack Overow as a knowledge unit. Knowledge units often contain semantically relevant knowledge, and thus linkable for different purposes, such as duplicate questions, directly linkable for problem solving, indirectly linkable for related information. Recognising different classes of linkable knowledge would support more targeted information needs when users search or explore the knowledge base. Existing methods focus on binary relatedness (i.e., related or not), and are not robust to recognize different classes of semantic relatedness when linkable knowledge units share few words in common (i.e., have lexical gap). In this paper, we formulate the problem of predicting semantically linkable knowledge units as a multiclass classification problem, and solve the problem using deep learning techniques. To overcome the lexical gap issue, we adopt neural language model (word embeddings) and convolutional neural network (CNN) to capture wordand document-level semantics of knowledge units. Instead of using human-engineered classifier features which are hard to design for informal user-generated content, we exploit large amounts of different types of user-created knowledge-unit links to train the CNN to learn the most informative wordlevel and document-level features for the multiclass classification task. Our evaluation shows that our deep-learning based approach significantly and consistently outperforms traditional methods using traditional word representations and human-engineered classifier features.
KW - Deep learning
KW - Link prediction
KW - Mining software repositories
KW - Multiclass classification
KW - Semantic relatedness
UR - http://www.scopus.com/inward/record.url?scp=84989182827&partnerID=8YFLogxK
U2 - 10.1145/2970276.2970357
DO - 10.1145/2970276.2970357
M3 - Conference contribution
T3 - ASE 2016 - Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering
SP - 51
EP - 62
BT - ASE 2016 - Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering
A2 - Khurshid, Sarfraz
A2 - Lo, David
A2 - Apel, Sven
PB - Association for Computing Machinery (ACM)
T2 - 31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016
Y2 - 3 September 2016 through 7 September 2016
ER -