TY - GEN
T1 - An effective and efficient truth discovery framework over data streams
AU - Li, Tianyi
AU - Gu, Yu
AU - Zhou, Xiangmin
AU - Ma, Qian
AU - Yu, Ge
N1 - Publisher Copyright:
© 2017, Copyright is with the authors.
PY - 2017
Y1 - 2017
N2 - Truth discovery, a validity assessment method for conflicting data from various sources, has been widely studied in the conventional database community. However, while existing methods for static scenario involve time-consuming iterative processes, those for streams suffer from much sacrifice on accuracy due to the incremental source weight learning. In this paper, we propose a novel framework to conduct truth discovery over streams, which incorporates various iterative methods to effectively estimate the source weights, and decides the frequency of source weight computation adaptively. Specifically, we first capture the characteristics of source weight evolution, based on which a framework is modeled. Then, we define the conditions of source weight evolution for the situations with relatively small unit and cumulative errors, and construct a probabilistic model that estimates the probability of meeting these conditions. Finally, we propose a novel scheme called adaptive source reliability assessment (ASRA), which converts an estimation problem into an optimization problem. We have conducted extensive experiments over real datasets to prove the high effectiveness and efficiency of our framework.
AB - Truth discovery, a validity assessment method for conflicting data from various sources, has been widely studied in the conventional database community. However, while existing methods for static scenario involve time-consuming iterative processes, those for streams suffer from much sacrifice on accuracy due to the incremental source weight learning. In this paper, we propose a novel framework to conduct truth discovery over streams, which incorporates various iterative methods to effectively estimate the source weights, and decides the frequency of source weight computation adaptively. Specifically, we first capture the characteristics of source weight evolution, based on which a framework is modeled. Then, we define the conditions of source weight evolution for the situations with relatively small unit and cumulative errors, and construct a probabilistic model that estimates the probability of meeting these conditions. Finally, we propose a novel scheme called adaptive source reliability assessment (ASRA), which converts an estimation problem into an optimization problem. We have conducted extensive experiments over real datasets to prove the high effectiveness and efficiency of our framework.
KW - Data quality
KW - Data streams
KW - Source reliability
KW - Truth discovery
UR - http://www.scopus.com/inward/record.url?scp=85046495572&partnerID=8YFLogxK
U2 - 10.5441/002/edbt.2017.17
DO - 10.5441/002/edbt.2017.17
M3 - Conference contribution
T3 - Advances in Database Technology - EDBT
SP - 180
EP - 191
BT - Advances in Database Technology - EDBT 2017
A2 - Mitschang, Bernhard
A2 - Markl, Volker
A2 - Bress, Sebastian
A2 - Andritsos, Periklis
A2 - Sattler, Kai-Uwe
A2 - Orlando, Salvatore
PB - OpenProceedings.org
T2 - 20th International Conference on Extending Database Technology, EDBT 2017
Y2 - 21 March 2017 through 24 March 2017
ER -