Robust Video Object Cosegmentation

Wenguan Wang, Jianbing Shen*, Xuelong Li, Fatih Porikli

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    132 Citations (Scopus)

    Abstract

    With ever-increasing volumes of video data, automatic extraction of salient object regions became even more significant for visual analytic solutions. This surge has also opened up opportunities for taking advantage of collective cues encapsulated in multiple videos in a cooperative manner. However, it also brings up major challenges, such as handling of drastic appearance, motion pattern, and pose variations, of foreground objects as well as indiscriminate backgrounds. Here, we present a cosegmentation framework to discover and segment out common object regions across multiple frames and multiple videos in a joint fashion. We incorporate three types of cues, i.e., intraframe saliency, interframe consistency, and across-video similarity into an energy optimization framework that does not make restrictive assumptions on foreground appearance and motion model, and does not require objects to be visible in all frames. We also introduce a spatio-temporal scale-invariant feature transform (SIFT) flow descriptor to integrate across-video correspondence from the conventional SIFT-flow into interframe motion flow from optical flow. This novel spatio-temporal SIFT flow generates reliable estimations of common foregrounds over the entire video data set. Experimental results show that our method outperforms the state-of-the-art on a new extensive data set (ViCoSeg).

    Original languageEnglish
    Article number7113836
    Pages (from-to)3137-3148
    Number of pages12
    JournalIEEE Transactions on Image Processing
    Volume24
    Issue number10
    DOIs
    Publication statusPublished - 1 Oct 2015

    Fingerprint

    Dive into the research topics of 'Robust Video Object Cosegmentation'. Together they form a unique fingerprint.

    Cite this