TY - GEN
T1 - Channel Recurrent Attention Networks for Video Pedestrian Retrieval
AU - Fang, Pengfei
AU - Ji, Pan
AU - Zhou, Jieming
AU - Petersson, Lars
AU - Harandi, Mehrtash
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - Full attention, which generates an attention value per element of the input feature maps, has been successfully demonstrated to be beneficial in visual tasks. In this work, we propose a fully attentional network, termed channel recurrent attention network, for the task of video pedestrian retrieval. The main attention unit, channel recurrent attention, identifies attention maps at the frame level by jointly leveraging spatial and channel patterns via a recurrent neural network. This channel recurrent attention is designed to build a global receptive field by recurrently receiving and learning the spatial vectors. Then, a set aggregation cell is employed to generate a compact video representation. Empirical experimental results demonstrate the superior performance of the proposed deep network, outperforming current state-of-the-art results across standard video person retrieval benchmarks, and a thorough ablation study shows the effectiveness of the proposed units.
AB - Full attention, which generates an attention value per element of the input feature maps, has been successfully demonstrated to be beneficial in visual tasks. In this work, we propose a fully attentional network, termed channel recurrent attention network, for the task of video pedestrian retrieval. The main attention unit, channel recurrent attention, identifies attention maps at the frame level by jointly leveraging spatial and channel patterns via a recurrent neural network. This channel recurrent attention is designed to build a global receptive field by recurrently receiving and learning the spatial vectors. Then, a set aggregation cell is employed to generate a compact video representation. Empirical experimental results demonstrate the superior performance of the proposed deep network, outperforming current state-of-the-art results across standard video person retrieval benchmarks, and a thorough ablation study shows the effectiveness of the proposed units.
KW - Channel recurrent attention
KW - Full attention
KW - Global receptive field
KW - Pedestrian retrieval
KW - Set aggregation
UR - http://www.scopus.com/inward/record.url?scp=85103257490&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-69544-6_26
DO - 10.1007/978-3-030-69544-6_26
M3 - Conference contribution
SN - 9783030695439
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 427
EP - 443
BT - Computer Vision – ACCV 2020 - 15th Asian Conference on Computer Vision, 2020, Revised Selected Papers
A2 - Ishikawa, Hiroshi
A2 - Liu, Cheng-Lin
A2 - Pajdla, Tomas
A2 - Shi, Jianbo
PB - Springer Science and Business Media Deutschland GmbH
T2 - 15th Asian Conference on Computer Vision, ACCV 2020
Y2 - 30 November 2020 through 4 December 2020
ER -