TY - JOUR
T1 - A Spatial Layout and Scale Invariant Feature Representation for Indoor Scene Classification
AU - Hayat, Munawar
AU - Khan, Salman H.
AU - Bennamoun, Mohammed
AU - An, Senjian
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/10
Y1 - 2016/10
N2 - Unlike standard object classification, where the image to be classified contains one or multiple instances of the same object, indoor scene classification is quite different since the image consists of multiple distinct objects. Furthermore, these objects can be of varying sizes and are present across numerous spatial locations in different layouts. For automatic indoor scene categorization, large-scale spatial layout deformations and scale variations are therefore two major challenges and the design of rich feature descriptors which are robust to these challenges is still an open problem. This paper introduces a new learnable feature descriptor called 'spatial layout and scale invariant convolutional activations' to deal with these challenges. For this purpose, a new convolutional neural network architecture is designed which incorporates a novel 'spatially unstructured' layer to introduce robustness against spatial layout deformations. To achieve scale invariance, we present a pyramidal image representation. For feasible training of the proposed network for images of indoor scenes, this paper proposes a methodology, which efficiently adapts a trained network model (on a large-scale data) for our task with only a limited amount of available training data. The efficacy of the proposed approach is demonstrated through extensive experiments on a number of data sets, including MIT-67, Scene-15, Sports-8, Graz-02, and NYU data sets.
AB - Unlike standard object classification, where the image to be classified contains one or multiple instances of the same object, indoor scene classification is quite different since the image consists of multiple distinct objects. Furthermore, these objects can be of varying sizes and are present across numerous spatial locations in different layouts. For automatic indoor scene categorization, large-scale spatial layout deformations and scale variations are therefore two major challenges and the design of rich feature descriptors which are robust to these challenges is still an open problem. This paper introduces a new learnable feature descriptor called 'spatial layout and scale invariant convolutional activations' to deal with these challenges. For this purpose, a new convolutional neural network architecture is designed which incorporates a novel 'spatially unstructured' layer to introduce robustness against spatial layout deformations. To achieve scale invariance, we present a pyramidal image representation. For feasible training of the proposed network for images of indoor scenes, this paper proposes a methodology, which efficiently adapts a trained network model (on a large-scale data) for our task with only a limited amount of available training data. The efficacy of the proposed approach is demonstrated through extensive experiments on a number of data sets, including MIT-67, Scene-15, Sports-8, Graz-02, and NYU data sets.
KW - Indoor scenes classification
KW - scale invariance
KW - spatial layout variations
UR - http://www.scopus.com/inward/record.url?scp=84986208898&partnerID=8YFLogxK
U2 - 10.1109/TIP.2016.2599292
DO - 10.1109/TIP.2016.2599292
M3 - Article
SN - 1057-7149
VL - 25
SP - 4829
EP - 4841
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
IS - 10
M1 - 7539697
ER -