The IKEA ASM Dataset: Understanding people assembling furniture through actions, objects and pose

Yizhak Ben Shabat, Xin Yu, Fatemehsadat Saleh, Dylan Campbell, Cristian Rodriguez Opazo, Hongdong Li, Stephen Gould

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    47 Citations (Scopus)

    Abstract

    The availability of a large labeled dataset is a key requirement for applying deep learning methods to solve various computer vision tasks. In the context of understanding human activities, existing public datasets, while large in size, are often limited to a single RGB camera and provide only per-frame or per-clip action annotations. To enable richer analysis and understanding of human activities, we introduce IKEA ASM - a three million frame, multi-view, furniture assembly video dataset that includes depth, atomic actions, object segmentation, and human poses. Additionally, we benchmark prominent methods for video action recognition, object segmentation and human pose estimation tasks on this challenging dataset. The dataset enables the development of holistic methods, which integrate multi-modal and multi-view data to better perform on these tasks.
    Original languageEnglish
    Title of host publicationProceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021
    Place of PublicationUnited States
    PublisherIEEE
    Pages846-858
    ISBN (Print)978-1-6654-0477-8
    DOIs
    Publication statusPublished - 2021
    Event2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021 - Virtual, Waikoloa, HI, USA
    Duration: 1 Jan 2021 → …
    https://wacv2021.thecvf.com/home

    Conference

    Conference2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021
    Period1/01/21 → …
    OtherJanuary 5-9, 2021
    Internet address

    Fingerprint

    Dive into the research topics of 'The IKEA ASM Dataset: Understanding people assembling furniture through actions, objects and pose'. Together they form a unique fingerprint.

    Cite this