Convolutional neural net bagging for online visual tracking

Hanxi Li, Yi Li*, Fatih Porikli

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    25 Citations (Scopus)

    Abstract

    Recently, Convolutional Neural Nets (CNNs) have been successfully applied to online visual tracking. However, a major problem is that such models may be inevitably over-fitted due to two main factors. The first one is the label noise because the online training of any models relies solely on the detection of the previous frames. The second one is the model uncertainty due to the randomized training strategy. In this work, we cope with noisy labels and the model uncertainty within the framework of bagging (bootstrap aggregating), resulting in efficient and effective visual tracking. Instead of using multiple models in a bag, we design a single multitask CNN for learning effective feature representations of the target object. In our model, each task has the same structure and shares the same set of convolutional features, but is trained using different random samples generated for different tasks. A significant advantage is that the bagging overhead for our model is minimal, and no extra efforts are needed to handle the outputs of different tasks as done in those multi-lifespan models. Experiments demonstrate that our CNN tracker outperforms the state-of-the-art methods on three recent benchmarks (over 80 video sequences), which illustrates the superiority of the feature representations learned by our purely online bagging framework.

    Original languageEnglish
    Pages (from-to)120-129
    Number of pages10
    JournalComputer Vision and Image Understanding
    Volume153
    DOIs
    Publication statusPublished - 1 Dec 2016

    Fingerprint

    Dive into the research topics of 'Convolutional neural net bagging for online visual tracking'. Together they form a unique fingerprint.

    Cite this