Deep0Tag: Deep Multiple Instance Learning for Zero-Shot Image Tagging

Shafin Rahman*, Salman Khan, Nick Barnes

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    26 Citations (Scopus)


    Zero-shot learning aims to perform visual reasoning about unseen objects. In-line with the success of deep learning on object recognition problems, several end-to-end deep models for zero-shot recognition have been proposed in the literature. These models are successful in predicting a single unseen label given an input image but do not scale to cases where multiple unseen objects are present. Here, we focus on the challenging problem of zero-shot image tagging, where multiple labels are assigned to an image, that may relate to objects, attributes, actions, events, and scene type. Discovery of these scene concepts requires the ability to process multi-scale information. To encompass global as well as local image details, we propose an automatic approach to locate relevant image patches and model image tagging within the Multiple Instance Learning (MIL) framework. To the best of our knowledge, we propose the first end-to-end trainable deep MIL framework for the multi-label zero-shot tagging problem. We explore several alternatives for instance-level evidence aggregation and perform an extensive ablation study to identify the optimal pooling strategy. Due to its novel design, the proposed framework has several interesting features: 1) unlike previous deep MIL models, it does not use any off-line procedure (e.g., Selective Search or EdgeBoxes) for bag generation. 2) During test time, it can process any number of unseen labels given their semantic embedding vectors. 3) Using only image-level seen labels as weak annotation, it can produce a localized bounding box for each predicted label. We experiment with the large-scale NUS-WIDE and MS-COCO datasets and achieve superior performance across conventional, zero-shot, and generalized zero-shot tagging tasks.

    Original languageEnglish
    Article number8744401
    Pages (from-to)242-255
    Number of pages14
    JournalIEEE Transactions on Multimedia
    Issue number1
    Publication statusPublished - Jan 2020


    Dive into the research topics of 'Deep0Tag: Deep Multiple Instance Learning for Zero-Shot Image Tagging'. Together they form a unique fingerprint.

    Cite this