TY - GEN
T1 - Attention-based pyramid aggregation network for visual place recognition
AU - Zhu, Yingying
AU - Xie, Lingxi
AU - Wang, Jiong
AU - Zheng, Liang
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/10/15
Y1 - 2018/10/15
N2 - Visual place recognition is challenging in the urban environment and is usually viewed as a large scale image retrieval task. The intrinsic challenges in place recognition exist that the confusing objects such as cars and trees frequently occur in the complex urban scene, and buildings with repetitive structures may cause over-counting and the burstiness problem degrading the image representations. To address these problems, we present an Attention-based Pyramid Aggregation Network (APANet), which is trained in an end-to-end manner for place recognition. One main component of APANet, the spatial pyramid pooling, can effectively encode the multi-size buildings containing geo-information. The other one, the attention block, is adopted as a region evaluator for suppressing the confusing regional features while highlighting the discriminative ones. When testing, we further propose a simple yet effective PCA power whitening strategy, which significantly improves the widely used PCA whitening by reasonably limiting the impact of over-counting. Experimental evaluations demonstrate that the proposed APANet outperforms the state-of-the-art methods on two place recognition benchmarks, and generalizes well on standard image retrieval datasets.
AB - Visual place recognition is challenging in the urban environment and is usually viewed as a large scale image retrieval task. The intrinsic challenges in place recognition exist that the confusing objects such as cars and trees frequently occur in the complex urban scene, and buildings with repetitive structures may cause over-counting and the burstiness problem degrading the image representations. To address these problems, we present an Attention-based Pyramid Aggregation Network (APANet), which is trained in an end-to-end manner for place recognition. One main component of APANet, the spatial pyramid pooling, can effectively encode the multi-size buildings containing geo-information. The other one, the attention block, is adopted as a region evaluator for suppressing the confusing regional features while highlighting the discriminative ones. When testing, we further propose a simple yet effective PCA power whitening strategy, which significantly improves the widely used PCA whitening by reasonably limiting the impact of over-counting. Experimental evaluations demonstrate that the proposed APANet outperforms the state-of-the-art methods on two place recognition benchmarks, and generalizes well on standard image retrieval datasets.
KW - Attention mechanism
KW - Content-based image retrieval
KW - Convolutional neural network
KW - Place recognition
UR - http://www.scopus.com/inward/record.url?scp=85058240859&partnerID=8YFLogxK
U2 - 10.1145/3240508.3240525
DO - 10.1145/3240508.3240525
M3 - Conference contribution
T3 - MM 2018 - Proceedings of the 2018 ACM Multimedia Conference
SP - 99
EP - 107
BT - MM 2018 - Proceedings of the 2018 ACM Multimedia Conference
PB - Association for Computing Machinery, Inc
T2 - 26th ACM Multimedia conference, MM 2018
Y2 - 22 October 2018 through 26 October 2018
ER -