TY - JOUR
T1 - Automatic refinement strategies for manual initialization of object trackers
AU - Zhu, Hao
AU - Porikli, Fatih
N1 - Publisher Copyright:
© 1992-2012 IEEE.
PY - 2017/2
Y1 - 2017/2
N2 - Tracking objects across multiple frames is a well-investigated problem in computer vision. The majority of the existing algorithms that assume an accurate initialization is readily available. However, in many real-life settings, in particular for applications where the video is streaming in real time, the initialization has to be provided by a human operator. This limitation raises an inevitable uncertainty issue. Here, we first collect a large and new data set of inputs that consists of more than 20 K human initialization clicks, by several subjects under three practical user interface scenarios for the popular TB50 tracking benchmark. We analyze the factors and mechanisms of human input, derive statistical models, and show that human input always contains deviations, which exacerbate further when the relative object-camera motion becomes large. We also design and evaluate alternative refinement schemes, and propose a strategy that refits an object window on the most probable target region after a single click. To compensate for the human initialization errors, our method generates window proposals using objectness cues extracted from color and motion attributes, accumulates them into a likelihood map that is weighted by the initial click position and visual saliency scores, and assigns the final window by the maximum likelihood estimate. Our experiments demonstrate that the presented refinement strategy effectively reduces human input errors.
AB - Tracking objects across multiple frames is a well-investigated problem in computer vision. The majority of the existing algorithms that assume an accurate initialization is readily available. However, in many real-life settings, in particular for applications where the video is streaming in real time, the initialization has to be provided by a human operator. This limitation raises an inevitable uncertainty issue. Here, we first collect a large and new data set of inputs that consists of more than 20 K human initialization clicks, by several subjects under three practical user interface scenarios for the popular TB50 tracking benchmark. We analyze the factors and mechanisms of human input, derive statistical models, and show that human input always contains deviations, which exacerbate further when the relative object-camera motion becomes large. We also design and evaluate alternative refinement schemes, and propose a strategy that refits an object window on the most probable target region after a single click. To compensate for the human initialization errors, our method generates window proposals using objectness cues extracted from color and motion attributes, accumulates them into a likelihood map that is weighted by the initial click position and visual saliency scores, and assigns the final window by the maximum likelihood estimate. Our experiments demonstrate that the presented refinement strategy effectively reduces human input errors.
KW - Object initialization
KW - error compensation
KW - human-computer interactive
KW - object tracking
UR - http://www.scopus.com/inward/record.url?scp=85015231094&partnerID=8YFLogxK
U2 - 10.1109/TIP.2016.2633874
DO - 10.1109/TIP.2016.2633874
M3 - Article
SN - 1057-7149
VL - 26
SP - 821
EP - 835
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
IS - 2
M1 - 7762927
ER -