TY - GEN
T1 - Indoor scene parsing with instance segmentation, semantic labeling and support relationship inference
AU - Zhuo, Wei
AU - Salzmann, Mathieu
AU - He, Xuming
AU - Liu, Miaomiao
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/11/6
Y1 - 2017/11/6
N2 - Over the years, indoor scene parsing has attracted a growing interest in the computer vision community. Existing methods have typically focused on diverse subtasks of this challenging problem. In particular, while some of them aim at segmenting the image into regions, such as object or surface instances, others aim at inferring the semantic labels of given regions, or their support relationships. These different tasks are typically treated as separate ones. However, they bear strong connections: good regions should respect the semantic labels; support can only be defined for meaningful regions; support relationships strongly depend on semantics. In this paper, we therefore introduce an approach to jointly segment the instances and infer their semantic labels and support relationships from a single input image. By exploiting a hierarchical segmentation, we formulate our problem as that of jointly finding the regions in the hierarchy that correspond to instances and estimating their class labels and pairwise support relationships. We express this via a Markov Random Field, which allows us to further encode links between the different types of variables. Inference in this model can be done exactly via integer linear programming, and we learn its parameters in a structural SVM framework. Our experiments on NYUv2 demonstrate the benefits of reasoning jointly about all these subtasks of indoor scene parsing.
AB - Over the years, indoor scene parsing has attracted a growing interest in the computer vision community. Existing methods have typically focused on diverse subtasks of this challenging problem. In particular, while some of them aim at segmenting the image into regions, such as object or surface instances, others aim at inferring the semantic labels of given regions, or their support relationships. These different tasks are typically treated as separate ones. However, they bear strong connections: good regions should respect the semantic labels; support can only be defined for meaningful regions; support relationships strongly depend on semantics. In this paper, we therefore introduce an approach to jointly segment the instances and infer their semantic labels and support relationships from a single input image. By exploiting a hierarchical segmentation, we formulate our problem as that of jointly finding the regions in the hierarchy that correspond to instances and estimating their class labels and pairwise support relationships. We express this via a Markov Random Field, which allows us to further encode links between the different types of variables. Inference in this model can be done exactly via integer linear programming, and we learn its parameters in a structural SVM framework. Our experiments on NYUv2 demonstrate the benefits of reasoning jointly about all these subtasks of indoor scene parsing.
UR - http://www.scopus.com/inward/record.url?scp=85044520273&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2017.664
DO - 10.1109/CVPR.2017.664
M3 - Conference contribution
T3 - Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
SP - 6269
EP - 6275
BT - Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
Y2 - 21 July 2017 through 26 July 2017
ER -