Detail preserving depth estimation from a single image using attention guided networks

Zhixiang Hao, Yu Li, Shaodi You, Feng Lu*

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    86 Citations (Scopus)

    Abstract

    Convolutional Neural Networks have demonstrated superior performance on single image depth estimation in recent years. These works usually use stacked spatial pooling or strided convolution to get high-level information which are common practices in classification task. However, depth estimation is a dense prediction problem and low-resolution feature maps usually generate blurred depth map which is undesirable in application. In order to produce high quality depth map, say clean and accurate, we propose a network consists of a Dense Feature Extractor (DFE) and a Depth Map Generator (DMG). The DFE combines ResNet and dilated convolutions. It extracts multi-scale information from input image while keeping the feature maps dense. As for DMG, we use attention mechanism to fuse multi-scale features produced in DFE. Our Network is trained end-to-end and does not need any post-processing. Hence, it runs fast and can predict depth map in about 15 fps. Experiment results show that our method is competitive with the state-of-the-art in quantitative evaluation, but can preserve better structural details of the scene depth.

    Original languageEnglish
    Title of host publicationProceedings - 2018 International Conference on 3D Vision, 3DV 2018
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages304-313
    Number of pages10
    ISBN (Electronic)9781538684252
    DOIs
    Publication statusPublished - 12 Oct 2018
    Event6th International Conference on 3D Vision, 3DV 2018 - Verona, Italy
    Duration: 5 Sept 20188 Sept 2018

    Publication series

    NameProceedings - 2018 International Conference on 3D Vision, 3DV 2018

    Conference

    Conference6th International Conference on 3D Vision, 3DV 2018
    Country/TerritoryItaly
    CityVerona
    Period5/09/188/09/18

    Fingerprint

    Dive into the research topics of 'Detail preserving depth estimation from a single image using attention guided networks'. Together they form a unique fingerprint.

    Cite this