TY - GEN
T1 - Who Will Leave the Company?
T2 - 14th IEEE/ACM International Conference on Mining Software Repositories, MSR 2017
AU - Bao, Lingfeng
AU - Xing, Zhenchang
AU - Xia, Xin
AU - Lo, David
AU - Li, Shanping
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/6/29
Y1 - 2017/6/29
N2 - Software developer turnover has become a big challenge for information technology (IT) companies. The departure of key software developers might cause big loss to an IT company since they also depart with important business knowledge and critical technical skills. Understanding developer turnover is very important for IT companies to retain talented developers and reduce the loss due to developers' departure. Previous studies mainly perform qualitative observations or simple statistical analysis of developers' activity data to understand developer turnover. In this paper, we investigate whether we can predict the turnover of software developers in non-open source companies by automatically analyzing monthly self-reports. The monthly work reports in our study are from two IT companies. Monthly reports in these two companies are used to report a developer's activities and working hours in a month. We would like to investigate whether a developer will leave the company after he/she enters company for one year based on his/her first six monthly reports. To perform our prediction, we extract many factors from monthly reports, which are grouped into 6 dimensions. We apply several classifiers including naive Bayes, SVM, decision tree, kNN and random forest. We conduct an experiment on about 6-years monthly reports from two companies, this data contains 3,638 developers over time. We find that random forest classifier achieves the best performance with an F1-measure of 0.86 for retained developers and an F1-measure of 0.65 for not-retained developers. We also investigate the relationship between our proposed factors and developers' departure, and the important factors that indicate a developer's departure. We find the content of task report in monthly reports, the standard deviation of working hours, and the standard deviation of working hours of project members in the first month are the top three important factors.
AB - Software developer turnover has become a big challenge for information technology (IT) companies. The departure of key software developers might cause big loss to an IT company since they also depart with important business knowledge and critical technical skills. Understanding developer turnover is very important for IT companies to retain talented developers and reduce the loss due to developers' departure. Previous studies mainly perform qualitative observations or simple statistical analysis of developers' activity data to understand developer turnover. In this paper, we investigate whether we can predict the turnover of software developers in non-open source companies by automatically analyzing monthly self-reports. The monthly work reports in our study are from two IT companies. Monthly reports in these two companies are used to report a developer's activities and working hours in a month. We would like to investigate whether a developer will leave the company after he/she enters company for one year based on his/her first six monthly reports. To perform our prediction, we extract many factors from monthly reports, which are grouped into 6 dimensions. We apply several classifiers including naive Bayes, SVM, decision tree, kNN and random forest. We conduct an experiment on about 6-years monthly reports from two companies, this data contains 3,638 developers over time. We find that random forest classifier achieves the best performance with an F1-measure of 0.86 for retained developers and an F1-measure of 0.65 for not-retained developers. We also investigate the relationship between our proposed factors and developers' departure, and the important factors that indicate a developer's departure. We find the content of task report in monthly reports, the standard deviation of working hours, and the standard deviation of working hours of project members in the first month are the top three important factors.
KW - Developer turnover
KW - Mining software repositories
KW - Prediction model
UR - http://www.scopus.com/inward/record.url?scp=85026554861&partnerID=8YFLogxK
U2 - 10.1109/MSR.2017.58
DO - 10.1109/MSR.2017.58
M3 - Conference contribution
T3 - IEEE International Working Conference on Mining Software Repositories
SP - 170
EP - 181
BT - Proceedings - 2017 IEEE/ACM 14th International Conference on Mining Software Repositories, MSR 2017
PB - IEEE Computer Society
Y2 - 20 May 2017 through 21 May 2017
ER -