TY - JOUR
T1 - Multimodal assistive technologies for depression diagnosis and monitoring
AU - Joshi, Jyoti
AU - Goecke, Roland
AU - Alghowinem, Sharifa
AU - Dhall, Abhinav
AU - Wagner, Michael
AU - Epps, Julien
AU - Parker, Gordon
AU - Breakspear, Michael
PY - 2013/11
Y1 - 2013/11
N2 - Depression is a severe mental health disorder with high societal costs. Current clinical practice depends almost exclusively on self-report and clinical opinion, risking a range of subjective biases. The long-term goal of our research is to develop assistive technologies to support clinicians and sufferers in the diagnosis and monitoring of treatment progress in a timely and easily accessible format. In the first phase, we aim to develop a diagnostic aid using affective sensing approaches. This paper describes the progress to date and proposes a novel multimodal framework comprising of audio-video fusion for depression diagnosis. We exploit the proposition that the auditory and visual human communication complement each other, which is well-known in auditory-visual speech processing; we investigate this hypothesis for depression analysis. For the video data analysis, intra-facial muscle movements and the movements of the head and shoulders are analysed by computing spatio-temporal interest points. In addition, various audio features (fundamental frequency f0, loudness, intensity and mel-frequency cepstral coefficients) are computed. Next, a bag of visual features and a bag of audio features are generated separately. In this study, we compare fusion methods at feature level, score level and decision level. Experiments are performed on an age and gender matched clinical dataset of 30 patients and 30 healthy controls. The results from the multimodal experiments show the proposed framework's effectiveness in depression analysis.
AB - Depression is a severe mental health disorder with high societal costs. Current clinical practice depends almost exclusively on self-report and clinical opinion, risking a range of subjective biases. The long-term goal of our research is to develop assistive technologies to support clinicians and sufferers in the diagnosis and monitoring of treatment progress in a timely and easily accessible format. In the first phase, we aim to develop a diagnostic aid using affective sensing approaches. This paper describes the progress to date and proposes a novel multimodal framework comprising of audio-video fusion for depression diagnosis. We exploit the proposition that the auditory and visual human communication complement each other, which is well-known in auditory-visual speech processing; we investigate this hypothesis for depression analysis. For the video data analysis, intra-facial muscle movements and the movements of the head and shoulders are analysed by computing spatio-temporal interest points. In addition, various audio features (fundamental frequency f0, loudness, intensity and mel-frequency cepstral coefficients) are computed. Next, a bag of visual features and a bag of audio features are generated separately. In this study, we compare fusion methods at feature level, score level and decision level. Experiments are performed on an age and gender matched clinical dataset of 30 patients and 30 healthy controls. The results from the multimodal experiments show the proposed framework's effectiveness in depression analysis.
KW - Bag of words
KW - Depression analysis
KW - LBP-TOP
KW - Multimodal
UR - http://www.scopus.com/inward/record.url?scp=84891737185&partnerID=8YFLogxK
U2 - 10.1007/s12193-013-0123-2
DO - 10.1007/s12193-013-0123-2
M3 - Article
SN - 1783-7677
VL - 7
SP - 217
EP - 228
JO - Journal on Multimodal User Interfaces
JF - Journal on Multimodal User Interfaces
IS - 3
ER -