DeepVANet: A Deep End-to-End Network for Multi-modal Emotion Recognition

Yuhao Zhang, Md Zakir Hossain, Shafin Rahman*

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    15 Citations (Scopus)

    Abstract

    Human facial expressions and bio-signals (e.g., electroencephalogram and electrocardiogram) play a vital role in emotion recognition. Recent approaches employ both vision-based and bio-sensing data to design multi-modal recognition systems. However, these approaches require tremendous domain-specific knowledge, complex pre-processing steps and fail to take full advantage of the end-to-end nature of deep learning techniques. This paper proposes a deep end-to-end framework, DeepVANet, for multi-modal valence-arousal-based emotion recognition that applies deep learning methods to extract face appearance features and bio-sensing features. We use convolutional long short-term memory (ConvLSTM) techniques in face appearance feature extraction to capture spatial and temporal information from face image sequences. Unlike conventional time or frequency domain features (e.g., spectral power and average signal intensity), we use a 1D convolutional neural network (Conv1D) to learn bio-sensing features automatically. In experiments, we evaluate our method using DEAP and MAHNOB-HCI datasets. Our proposed multi-modal framework successfully outperforms both single- and multi-modal methods achieving superior performance compared to state-of-the-art approaches and reaches as high as 99.22% correctness.

    Original languageEnglish
    Title of host publicationHuman-Computer Interaction – INTERACT 2021 - 18th IFIP TC 13 International Conference, Proceedings
    EditorsCarmelo Ardito, Rosa Lanzilotti, Alessio Malizia, Alessio Malizia, Helen Petrie, Antonio Piccinno, Giuseppe Desolda, Kori Inkpen
    PublisherSpringer Science and Business Media Deutschland GmbH
    Pages227-237
    Number of pages11
    ISBN (Print)9783030856120
    DOIs
    Publication statusPublished - 2021
    Event18th IFIP TC 13 International Conference on Human-Computer Interaction, INTERACT 2021 - Virtual, Online
    Duration: 30 Aug 20213 Sept 2021

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume12934 LNCS
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference18th IFIP TC 13 International Conference on Human-Computer Interaction, INTERACT 2021
    CityVirtual, Online
    Period30/08/213/09/21

    Fingerprint

    Dive into the research topics of 'DeepVANet: A Deep End-to-End Network for Multi-modal Emotion Recognition'. Together they form a unique fingerprint.

    Cite this