TY - GEN
T1 - Leveraging Side Information to Improve Label Quality Control in Crowd-Sourcing
AU - Jin, Yuan
AU - Carman, Mark
AU - Kim, Dongwoo
AU - Xie, Lexing
N1 - Publisher Copyright:
Copyright © 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2017/10/27
Y1 - 2017/10/27
N2 - We investigate the possibility of leveraging side information for improving quality control over crowd-sourced data. We extend the GLAD model, which governs the probability of correct labeling through a logistic function in which worker expertise counteracts item difficulty, by systematically encoding different types of side information, including worker information drawn from demographics and personality traits, item information drawn from item genres and content, and contextual information drawn from worker responses and labeling sessions. Modeling side information allows for better estimation of worker expertise and item difficulty in sparse data situations and accounts for worker biases, leading to better prediction of posterior true label probabilities. We demonstrate the efficacy of the proposed framework with overall improvements in both the true label prediction and the unseen worker response prediction based on different combinations of the various types of side information across three new crowd-sourcing datasets. In addition, we show the framework exhibits potential of identifying salient side information features for predicting the correctness of responses without the need of knowing any true label information.
AB - We investigate the possibility of leveraging side information for improving quality control over crowd-sourced data. We extend the GLAD model, which governs the probability of correct labeling through a logistic function in which worker expertise counteracts item difficulty, by systematically encoding different types of side information, including worker information drawn from demographics and personality traits, item information drawn from item genres and content, and contextual information drawn from worker responses and labeling sessions. Modeling side information allows for better estimation of worker expertise and item difficulty in sparse data situations and accounts for worker biases, leading to better prediction of posterior true label probabilities. We demonstrate the efficacy of the proposed framework with overall improvements in both the true label prediction and the unseen worker response prediction based on different combinations of the various types of side information across three new crowd-sourcing datasets. In addition, we show the framework exhibits potential of identifying salient side information features for predicting the correctness of responses without the need of knowing any true label information.
UR - http://www.scopus.com/inward/record.url?scp=85079540117&partnerID=8YFLogxK
M3 - Conference contribution
T3 - Proceedings of the 5th AAAI Conference on Human Computation and Crowdsourcing, HCOMP 2017
SP - 79
EP - 88
BT - Proceedings of the 5th AAAI Conference on Human Computation and Crowdsourcing, HCOMP 2017
A2 - Dow, Steven
A2 - Tauman, Adam
PB - AAAI Press
T2 - 5th AAAI Conference on Human Computation and Crowdsourcing, HCOMP 2017
Y2 - 24 October 2017 through 26 October 2017
ER -