Unsupervised learning of endoscopy video frames’ correspondences from global and local transformation

Mohammad Ali Armin*, Nick Barnes, Salman Khan, Miaomiao Liu, Florian Grimpen, Olivier Salvado

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    7 Citations (Scopus)

    Abstract

    Inferring the correspondences between consecutive video frames with high accuracy is essential for many medical image processing and computer vision tasks (e.g. image mosaicking, 3D scene reconstruction). Image correspondences can be computed by feature extraction and matching algorithms, which are computationally expensive and are challenged by low texture frames. Convolutional neural networks (CNN) can estimate dense image correspondences with high accuracy, but lack of labeled data especially in medical imaging does not allow end-to-end supervised training. In this paper, we present an unsupervised learning method to estimate dense image correspondences (DIC) between endoscopy frames by developing a new CNN model, called the EndoRegNet. Our proposed network has three distinguishing aspects: a local DIC estimator, a polynomial image transformer which regularizes local correspondences and a visibility mask which refines image correspondences. The EndoRegNet was trained on a mix of simulated and real endoscopy video frames, while its performance was evaluated on real endoscopy frames. We compared the results of EndoRegNet with traditional feature-based image registration. Our results show that EndoRegNet can provide faster and more accurate image correspondences estimation. It can also effectively deal with deformations and occlusions which are common in endoscopy video frames without requiring any labeled data.

    Original languageEnglish
    Title of host publicationOR 2.0 Context-Aware Operating Theaters, Computer Assisted Robotic Endoscopy, Clinical Image-Based Procedures, and Skin Image Analysis - 1st International Workshop, OR 2.0 2018 5th International Workshop, CARE 2018, 7th International Workshop, CLIP 2018, 3rd International Workshop, ISIC 2018 Held in Conjunction with MICCAI 2018
    EditorsAnand Malpani, Marco A. Zenati, Cristina Oyarzun Laura, M. Emre Celebi, Duygu Sarikaya, Noel C. Codella, Allan Halpern, Marius Erdt, Lena Maier-Hein, Luo Xiongbiao, Stefan Wesarg, Danail Stoyanov, Zeike Taylor, Klaus Drechsler, Kristin Dana, Anne Martel, Raj Shekhar, Sandrine De Ribaupierre, Tobias Reichl, Jonathan McLeod, Miguel Angel González Ballester, Toby Collins, Marius George Linguraru
    PublisherSpringer Verlag
    Pages108-117
    Number of pages10
    ISBN (Print)9783030012007
    DOIs
    Publication statusPublished - 2018
    Event1st International Workshop on OR 2.0 Context-Aware Operating Theaters, OR 2.0 2018, 5th International Workshop on Computer Assisted Robotic Endoscopy, CARE 2018, 7th International Workshop on Clinical Image-Based Procedures, CLIP 2018, and 1st International Workshop on Skin Image Analysis, ISIC 2018, held in conjunction with the 21st International Conference on Medical Imaging and Computer-Assisted Intervention, MICCAI 2018 - Granada, Spain
    Duration: 16 Sept 201820 Sept 2018

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume11041 LNCS
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference1st International Workshop on OR 2.0 Context-Aware Operating Theaters, OR 2.0 2018, 5th International Workshop on Computer Assisted Robotic Endoscopy, CARE 2018, 7th International Workshop on Clinical Image-Based Procedures, CLIP 2018, and 1st International Workshop on Skin Image Analysis, ISIC 2018, held in conjunction with the 21st International Conference on Medical Imaging and Computer-Assisted Intervention, MICCAI 2018
    Country/TerritorySpain
    CityGranada
    Period16/09/1820/09/18

    Fingerprint

    Dive into the research topics of 'Unsupervised learning of endoscopy video frames’ correspondences from global and local transformation'. Together they form a unique fingerprint.

    Cite this