TY - JOUR
T1 - Attention in Attention Networks for Person Retrieval
AU - Fang, Pengfei
AU - Zhou, Jieming
AU - Roy, Soumava Kumar
AU - Ji, Pan
AU - Petersson, Lars
AU - Harandi, Mehrtash
N1 - Publisher Copyright:
© 1979-2012 IEEE.
PY - 2022/9/1
Y1 - 2022/9/1
N2 - This paper generalizes the Attention in Attention (AiA) mechanism, in P. Fang et al., 2019 by employing explicit mapping in reproducing kernel Hilbert spaces to generate attention values of the input feature map. The AiA mechanism models the capacity of building inter-dependencies among the local and global features by the interaction of inner and outer attention modules. Besides a vanilla AiA module, termed linear attention with AiA, two non-linear counterparts, namely, second-order polynomial attention and Gaussian attention, are also proposed to utilize the non-linear properties of the input features explicitly, via the second-order polynomial kernel and Gaussian kernel approximation. The deep convolutional neural network, equipped with the proposed AiA blocks, is referred to as Attention in Attention Network (AiA-Net). The AiA-Net learns to extract a discriminative pedestrian representation, which combines complementary person appearance and corresponding part features. Extensive ablation studies verify the effectiveness of the AiA mechanism and the use of non-linear features hidden in the feature map for attention design. Furthermore, our approach outperforms current state-of-the-art by a considerable margin across a number of benchmarks. In addition, state-of-the-art performance is also achieved in the video person retrieval task with the assistance of the proposed AiA blocks.
AB - This paper generalizes the Attention in Attention (AiA) mechanism, in P. Fang et al., 2019 by employing explicit mapping in reproducing kernel Hilbert spaces to generate attention values of the input feature map. The AiA mechanism models the capacity of building inter-dependencies among the local and global features by the interaction of inner and outer attention modules. Besides a vanilla AiA module, termed linear attention with AiA, two non-linear counterparts, namely, second-order polynomial attention and Gaussian attention, are also proposed to utilize the non-linear properties of the input features explicitly, via the second-order polynomial kernel and Gaussian kernel approximation. The deep convolutional neural network, equipped with the proposed AiA blocks, is referred to as Attention in Attention Network (AiA-Net). The AiA-Net learns to extract a discriminative pedestrian representation, which combines complementary person appearance and corresponding part features. Extensive ablation studies verify the effectiveness of the AiA mechanism and the use of non-linear features hidden in the feature map for attention design. Furthermore, our approach outperforms current state-of-the-art by a considerable margin across a number of benchmarks. In addition, state-of-the-art performance is also achieved in the video person retrieval task with the assistance of the proposed AiA blocks.
KW - Attention in attention mechanism
KW - Gaussian kernel
KW - convolutional neural network
KW - pedestrian representation
KW - person retrieval
KW - second-order polynomial kernel
UR - http://www.scopus.com/inward/record.url?scp=85104651366&partnerID=8YFLogxK
U2 - 10.1109/TPAMI.2021.3073512
DO - 10.1109/TPAMI.2021.3073512
M3 - Article
SN - 0162-8828
VL - 44
SP - 4626
EP - 4641
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
IS - 9
ER -