TY - JOUR
T1 - Spatial encoding of visual words for image classification
AU - Liu, Dong
AU - Wang, Shengsheng
AU - Porikli, Fatih
N1 - Publisher Copyright:
© 2016 SPIE and IS&T.
PY - 2016/5/1
Y1 - 2016/5/1
N2 - Appearance-based bag-of-visual words (BoVW) models are employed to represent the frequency of a vocabulary of local features in an image. Due to their versatility, they are widely popular, although they ignore the underlying spatial context and relationships among the features. Here, we present a unified representation that enhances BoVWs with explicit local and global structure models. Three aspects of our method should be noted in comparison to the previous approaches. First, we use a local structure feature that encodes the spatial attributes between a pair of points in a discriminative fashion using class-label information. We introduce a bag-of-structural words (BoSW) model for the given image set and describe each image with this model on its coarsely sampled relevant keypoints. We then combine the codebook histograms of BoVW and BoSW to train a classifier. Rigorous experimental evaluations on four benchmark data sets demonstrate that the unified representation outperforms the conventional models and compares favorably to more sophisticated scene classification techniques.
AB - Appearance-based bag-of-visual words (BoVW) models are employed to represent the frequency of a vocabulary of local features in an image. Due to their versatility, they are widely popular, although they ignore the underlying spatial context and relationships among the features. Here, we present a unified representation that enhances BoVWs with explicit local and global structure models. Three aspects of our method should be noted in comparison to the previous approaches. First, we use a local structure feature that encodes the spatial attributes between a pair of points in a discriminative fashion using class-label information. We introduce a bag-of-structural words (BoSW) model for the given image set and describe each image with this model on its coarsely sampled relevant keypoints. We then combine the codebook histograms of BoVW and BoSW to train a classifier. Rigorous experimental evaluations on four benchmark data sets demonstrate that the unified representation outperforms the conventional models and compares favorably to more sophisticated scene classification techniques.
KW - bag-of-words
KW - scene classification
KW - spatial feature representations
KW - visual descriptors
UR - http://www.scopus.com/inward/record.url?scp=84973473733&partnerID=8YFLogxK
U2 - 10.1117/1.JEI.25.3.033008
DO - 10.1117/1.JEI.25.3.033008
M3 - Article
SN - 1017-9909
VL - 25
JO - Journal of Electronic Imaging
JF - Journal of Electronic Imaging
IS - 3
M1 - 033008
ER -