Multi-class semantic video segmentation with exemplar-based object reasoning

Buyu Liu, Xuming He, Stephen Gould

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    11 Citations (Scopus)

    Abstract

    We tackle the problem of semantic segmentation of dynamic scene in video sequences. We propose to incorporate foreground object information into pixel labeling by jointly reasoning semantic labels of super-voxels, object instance tracks and geometric relations between objects. We take an exemplar approach to object modeling by using a small set of object annotations and exploring the temporal consistency of object motion. After generating a set of moving object hypotheses, we design a CRF framework that jointly models the super voxel and object instances. The optimal semantic labeling is inferred by the MAP estimation of the model, which is solved by a single move-making based optimization procedure. We demonstrate the effectiveness of our method on three public datasets and show that our model can achieve superior or comparable results than the state of-the-art with less object-level supervision.

    Original languageEnglish
    Title of host publicationProceedings - 2015 IEEE Winter Conference on Applications of Computer Vision, WACV 2015
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages1014-1021
    Number of pages8
    ISBN (Electronic)9781479966820
    DOIs
    Publication statusPublished - 19 Feb 2015
    Event2015 15th IEEE Winter Conference on Applications of Computer Vision, WACV 2015 - Waikoloa, United States
    Duration: 5 Jan 20159 Jan 2015

    Publication series

    NameProceedings - 2015 IEEE Winter Conference on Applications of Computer Vision, WACV 2015

    Conference

    Conference2015 15th IEEE Winter Conference on Applications of Computer Vision, WACV 2015
    Country/TerritoryUnited States
    CityWaikoloa
    Period5/01/159/01/15

    Fingerprint

    Dive into the research topics of 'Multi-class semantic video segmentation with exemplar-based object reasoning'. Together they form a unique fingerprint.

    Cite this