Abstract
A system capable of interpreting affect from a speaking face must recognise and fuse signals from multiple cues. Building such a system requires the integration of software components to perform tasks such as image registration, video segmentation, speech recognition and classification. Such software components tend to be idiosyncratic, purpose-built, and driven by scripts and textual configuration files. Integrating components to achieve the necessary degree of flexibility to perform full multimodal affective recognition is challenging. We discuss the key requirements and describe a system to perform multimodal affect sensing which integrates such software components and meets these requirements.
Original language | English |
---|---|
Pages (from-to) | 2767-2770 |
Number of pages | 4 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Publication status | Published - 2008 |
Event | INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association - Brisbane, QLD, Australia Duration: 22 Sept 2008 → 26 Sept 2008 |