TCA-NET: TRIPLET CONCATENATED-ATTENTIONAL NETWORK FOR MULTIMODAL ENGAGEMENT ESTIMATION

Hongyuan He, Daming Wang, Md Rakibul Hasan, Tom Gedeon, Md Zakir Hossain

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Human social interactions involve intricate social signals that artificial intelligence and machine learning models aim to decipher, particularly in the context of artificial mediators that can enhance human interactions across domains like education and healthcare. Engagement, a key aspect of these interactions, relies heavily on multimodal information like facial expressions, voice and posture. Recently, many deep learning methods have been deployed in engagement estimation. Still, they often focus on unimodality or bimodality, leading to the results lacking robustness and adaptability due to factors like noise and varying individual responses. To address this challenge, we introduce a novel modality fusion framework named Triplet Concatenated-Attentional Net (TCA-Net). This framework takes three distinct types of data modality (video, audio and Kinect) as inputs and delivers a prediction score as output. Within this network, a specially designed concatenated-attention fusion mechanism serves the purpose of modality fusion and preserves the intra-modal features. Experimental results validate the efficiency of our TCA-Net in enhancing the accuracy and reliability of engagement estimation across diverse scenarios, with a test set Concordance Correlation Coefficient (CCC) of 0.75. We release our code at https://github.com/Daming-W/Multimodal_Engagement_Estimation.

Original languageEnglish
Title of host publication2024 IEEE International Conference on Image Processing, ICIP 2024 - Proceedings
PublisherIEEE Computer Society
Pages2062-2068
Number of pages7
ISBN (Electronic)9798350349399
DOIs
Publication statusPublished - 2024
Event31st IEEE International Conference on Image Processing, ICIP 2024 - Abu Dhabi, United Arab Emirates
Duration: 27 Oct 202430 Oct 2024

Publication series

NameProceedings - International Conference on Image Processing, ICIP
ISSN (Print)1522-4880

Conference

Conference31st IEEE International Conference on Image Processing, ICIP 2024
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period27/10/2430/10/24

Fingerprint

Dive into the research topics of 'TCA-NET: TRIPLET CONCATENATED-ATTENTIONAL NETWORK FOR MULTIMODAL ENGAGEMENT ESTIMATION'. Together they form a unique fingerprint.

Cite this