TY - JOUR
T1 - Incorporating Network Built-in Priors in Weakly-Supervised Semantic Segmentation
AU - Saleh, Fatemeh Sadat
AU - Aliakbarian, Mohammad Sadegh
AU - Salzmann, Mathieu
AU - Petersson, Lars
AU - Alvarez, Jose M.
AU - Gould, Stephen
N1 - Publisher Copyright:
© 1979-2012 IEEE.
PY - 2018/6/1
Y1 - 2018/6/1
N2 - Pixel-level annotations are expensive and time consuming to obtain. Hence, weak supervision using only image tags could have a significant impact in semantic segmentation. Recently, CNN-based methods have proposed to fine-tune pre-trained networks using image tags. Without additional information, this leads to poor localization accuracy. This problem, however, was alleviated by making use of objectness priors to generate foreground/background masks. Unfortunately these priors either require pixel-level annotations/bounding boxes, or still yield inaccurate object boundaries. Here, we propose a novel method to extract accurate masks from networks pre-trained for the task of object recognition, thus forgoing external objectness modules. We first show how foreground/background masks can be obtained from the activations of higher-level convolutional layers of a network. We then show how to obtain multi-class masks by the fusion of foreground/background ones with information extracted from a weakly-supervised localization network. Our experiments evidence that exploiting these masks in conjunction with a weakly-supervised training loss yields state-of-the-art tag-based weakly-supervised semantic segmentation results.
AB - Pixel-level annotations are expensive and time consuming to obtain. Hence, weak supervision using only image tags could have a significant impact in semantic segmentation. Recently, CNN-based methods have proposed to fine-tune pre-trained networks using image tags. Without additional information, this leads to poor localization accuracy. This problem, however, was alleviated by making use of objectness priors to generate foreground/background masks. Unfortunately these priors either require pixel-level annotations/bounding boxes, or still yield inaccurate object boundaries. Here, we propose a novel method to extract accurate masks from networks pre-trained for the task of object recognition, thus forgoing external objectness modules. We first show how foreground/background masks can be obtained from the activations of higher-level convolutional layers of a network. We then show how to obtain multi-class masks by the fusion of foreground/background ones with information extracted from a weakly-supervised localization network. Our experiments evidence that exploiting these masks in conjunction with a weakly-supervised training loss yields state-of-the-art tag-based weakly-supervised semantic segmentation results.
KW - Semantic segmentation
KW - convolutional neural networks
KW - weak annotations
KW - weakly-supervised semantic segmentation
UR - http://www.scopus.com/inward/record.url?scp=85046861866&partnerID=8YFLogxK
U2 - 10.1109/TPAMI.2017.2713785
DO - 10.1109/TPAMI.2017.2713785
M3 - Article
SN - 0162-8828
VL - 40
SP - 1382
EP - 1396
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
IS - 6
ER -