See more, know more: Unsupervised video object segmentation with co-attention siamese networks

Xiankai Lu, Wenguan Wang, Chao Ma, Jianbing Shen*, Ling Shao, Fatih Porikli

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    459 Citations (Scopus)

    Abstract

    We introduce a novel network, called as CO-attention Siamese Network (COSNet), to address the unsupervised video object segmentation task from a holistic view. We emphasize the importance of inherent correlation among video frames and incorporate a global co-attention mechanism to improve further the state-of-the-art deep learning based solutions that primarily focus on learning discriminative foreground representations over appearance and motion in short-term temporal segments. The co-attention layers in our network provide efficient and competent stages for capturing global correlations and scene context by jointly computing and appending co-attention responses into a joint feature space. We train COSNet with pairs of video frames, which naturally augments training data and allows increased learning capacity. During the segmentation stage, the co-attention model encodes useful information by processing multiple reference frames together, which is leveraged to infer the frequently reappearing and salient foreground objects better. We propose a unified and end-to-end trainable framework where different co-attention variants can be derived for mining the rich context within videos. Our extensive experiments over three large benchmarks manifest that COSNet outperforms the current alternatives by a large margin. We will publicly release our implementation and models.

    Original languageEnglish
    Title of host publicationProceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019
    PublisherIEEE Computer Society
    Pages3618-3627
    Number of pages10
    ISBN (Electronic)9781728132938
    DOIs
    Publication statusPublished - Jun 2019
    Event32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019 - Long Beach, United States
    Duration: 16 Jun 201920 Jun 2019

    Publication series

    NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
    Volume2019-June
    ISSN (Print)1063-6919

    Conference

    Conference32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019
    Country/TerritoryUnited States
    CityLong Beach
    Period16/06/1920/06/19

    Fingerprint

    Dive into the research topics of 'See more, know more: Unsupervised video object segmentation with co-attention siamese networks'. Together they form a unique fingerprint.

    Cite this