SemStyle: Learning to Generate Stylised Image Captions Using Unaligned Text

Alexander Mathews, Lexing Xie, Xuming He

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    97 Citations (Scopus)

    Abstract

    Linguistic style is an essential part of written communication, with the power to affect both clarity and attractiveness. With recent advances in vision and language, we can start to tackle the problem of generating image captions that are both visually grounded and appropriately styled. Existing approaches either require styled training captions aligned to images or generate captions with low relevance. We develop a model that learns to generate visually relevant styled captions from a large corpus of styled text without aligned images. The core idea of this model, called SemStyle, is to separate semantics and style. One key component is a novel and concise semantic term representation generated using natural language processing techniques and frame semantics. In addition, we develop a unified language model that decodes sentences with diverse word choices and syntax for different styles. Evaluations, both automatic and manual, show captions from SemStyle preserve image semantics, are descriptive, and are style shifted. More broadly, this work provides possibilities to learn richer image descriptions from the plethora of linguistic data available on the web.

    Original languageEnglish
    Title of host publicationProceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018
    PublisherIEEE Computer Society
    Pages8591-8600
    Number of pages10
    ISBN (Electronic)9781538664209
    DOIs
    Publication statusPublished - 14 Dec 2018
    Event31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018 - Salt Lake City, United States
    Duration: 18 Jun 201822 Jun 2018

    Publication series

    NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
    ISSN (Print)1063-6919

    Conference

    Conference31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018
    Country/TerritoryUnited States
    CitySalt Lake City
    Period18/06/1822/06/18

    Fingerprint

    Dive into the research topics of 'SemStyle: Learning to Generate Stylised Image Captions Using Unaligned Text'. Together they form a unique fingerprint.

    Cite this