TY - GEN
T1 - Robust online visual tracking with a single convolutional neural network
AU - Li, Hanxi
AU - Li, Yi
AU - Porikli, Fatih
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2015.
PY - 2015
Y1 - 2015
N2 - Deep neural networks, albeit their great success on feature learning in various computer vision tasks, are usually considered as impractical for online visual tracking because they require very long training time and a large number of training samples. In this work, we present an efficient and very robust online tracking algorithm using a single Convolutional Neural Network (CNN) for learning effective feature representations of the target object over time. Our contributions are multifold: First, we introduce a novel truncated structural loss function that maintains as many training samples as possible and reduces the risk of tracking error accumulation, thus drift, by accommodating the uncertainty of the model output. Second, we enhance the ordinary Stochastic Gradient Descent approach in CNN training with a temporal selection mechanism, which generates positive and negative samples within different time periods. Finally, we propose to update the CNN model in a “lazy” style to speed-up the training stage, where the network is updated only when a significant appearance change occurs on the object, without sacrificing tracking accuracy. The CNN tracker outperforms all compared state-ofthe- art methods in our extensive evaluations that involve 18 well-known benchmark video sequences.
AB - Deep neural networks, albeit their great success on feature learning in various computer vision tasks, are usually considered as impractical for online visual tracking because they require very long training time and a large number of training samples. In this work, we present an efficient and very robust online tracking algorithm using a single Convolutional Neural Network (CNN) for learning effective feature representations of the target object over time. Our contributions are multifold: First, we introduce a novel truncated structural loss function that maintains as many training samples as possible and reduces the risk of tracking error accumulation, thus drift, by accommodating the uncertainty of the model output. Second, we enhance the ordinary Stochastic Gradient Descent approach in CNN training with a temporal selection mechanism, which generates positive and negative samples within different time periods. Finally, we propose to update the CNN model in a “lazy” style to speed-up the training stage, where the network is updated only when a significant appearance change occurs on the object, without sacrificing tracking accuracy. The CNN tracker outperforms all compared state-ofthe- art methods in our extensive evaluations that involve 18 well-known benchmark video sequences.
UR - http://www.scopus.com/inward/record.url?scp=84929610418&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-16814-2_13
DO - 10.1007/978-3-319-16814-2_13
M3 - Conference contribution
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 194
EP - 209
BT - Computer Vision - ACCV 2014 - 12th Asian Conference on Computer Vision, Revised Selected Papers
A2 - Cremers, Daniel
A2 - Saito, Hideo
A2 - Reid, Ian
A2 - Yang, Ming-Hsuan
PB - Springer Verlag
T2 - 12th Asian Conference on Computer Vision, ACCV 2014
Y2 - 1 November 2014 through 5 November 2014
ER -