MLDA-Net: Multi-Level Dual Attention-Based Network for Self-Supervised Monocular Depth Estimation

Xibin Song, Wei Li*, Dingfu Zhou, Yuchao Dai, Jin Fang, Hongdong Li, Liangjun Zhang

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    40 Citations (Scopus)

    Abstract

    The success of supervised learning-based single image depth estimation methods critically depends on the availability of large-scale dense per-pixel depth annotations, which requires both laborious and expensive annotation process. Therefore, the self-supervised methods are much desirable, which attract significant attention recently. However, depth maps predicted by existing self-supervised methods tend to be blurry with many depth details lost. To overcome these limitations, we propose a novel framework, named MLDA-Net, to obtain per-pixel depth maps with shaper boundaries and richer depth details. Our first innovation is a multi-level feature extraction (MLFE) strategy which can learn rich hierarchical representation. Then, a dual-attention strategy, combining global attention and structure attention, is proposed to intensify the obtained features both globally and locally, resulting in improved depth maps with sharper boundaries. Finally, a reweighted loss strategy based on multi-level outputs is proposed to conduct effective supervision for self-supervised depth estimation. Experimental results demonstrate that our MLDA-Net framework achieves state-of-the-art depth prediction results on the KITTI benchmark for self-supervised monocular depth estimation with different input modes and training modes. Extensive experiments on other benchmark datasets further confirm the superiority of our proposed approach.

    Original languageEnglish
    Article number9416235
    Pages (from-to)4691-4705
    Number of pages15
    JournalIEEE Transactions on Image Processing
    Volume30
    DOIs
    Publication statusPublished - 2021

    Fingerprint

    Dive into the research topics of 'MLDA-Net: Multi-Level Dual Attention-Based Network for Self-Supervised Monocular Depth Estimation'. Together they form a unique fingerprint.

    Cite this