Guiding the long-short term memory model for image caption generation

Xu Jia, Efstratios Gavves, Basura Fernando, Tinne Tuytelaars

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    365 Citations (Scopus)

    Abstract

    In this work we focus on the problem of image caption generation. We propose an extension of the long short term memory (LSTM) model, which we coin gLSTM for short. In particular, we add semantic information extracted from the image as extra input to each unit of the LSTM block, with the aim of guiding the model towards solutions that are more tightly coupled to the image content. Additionally, we explore different length normalization strategies for beam search to avoid bias towards short sentences. On various benchmark datasets such as Flickr8K, Flickr30K and MS COCO, we obtain results that are on par with or better than the current state-of-the-art.

    Original languageEnglish
    Title of host publication2015 International Conference on Computer Vision, ICCV 2015
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages2407-2415
    Number of pages9
    ISBN (Electronic)9781467383912
    DOIs
    Publication statusPublished - 17 Feb 2015
    Event15th IEEE International Conference on Computer Vision, ICCV 2015 - Santiago, Chile
    Duration: 11 Dec 201518 Dec 2015

    Publication series

    NameProceedings of the IEEE International Conference on Computer Vision
    Volume2015 International Conference on Computer Vision, ICCV 2015
    ISSN (Print)1550-5499

    Conference

    Conference15th IEEE International Conference on Computer Vision, ICCV 2015
    Country/TerritoryChile
    CitySantiago
    Period11/12/1518/12/15

    Fingerprint

    Dive into the research topics of 'Guiding the long-short term memory model for image caption generation'. Together they form a unique fingerprint.

    Cite this