Spatial-Temporal-Class Attention Network for Acoustic Scene Classification

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    2 Citations (Scopus)

    Abstract

    Acoustic scene classification, where a scene is identified from a sound recording, is a difficult problem that is much less studied than similar problems in computer vision. Re-cent advances in attention-based convolution neural networks (CNNs) can be applied to audio data by operating on two dimensional spectrograms, where frequency and time infor-mation have been separated, rather than a raw audio signal. Typical CNNs have difficulty coping with this problem due to the temporal aspects of acoustic data. In this research we propose a novel and intuitive CNN-based architecture with attention mechanisms called the spatial-temporal-class attention network (STCANet). The STCANet consists of a spatial-temporal attention and a class attention which extracts in-formation along with frequency, temporal, and the class di-mension of spectrograms. In our experiments, the STCANet achieved 75.6%, 95.4%, and 97.0% accuracy on TUT 2018, TAU 2020, and ESC-I0 datasets that are competitive results compared with previous works. Our contributions include this novel network design and a detailed analysis of how attention allows these results to be achieved.

    Original languageEnglish
    Title of host publicationICME 2022 - IEEE International Conference on Multimedia and Expo 2022, Proceedings
    PublisherIEEE Computer Society
    Number of pages6
    ISBN (Electronic)9781665485630
    DOIs
    Publication statusPublished - 26 Aug 2022
    Event2022 IEEE International Conference on Multimedia and Expo, ICME 2022 - Taipei, Taiwan
    Duration: 18 Jul 202222 Jul 2022

    Publication series

    NameProceedings - IEEE International Conference on Multimedia and Expo
    Volume2022-July
    ISSN (Print)1945-7871
    ISSN (Electronic)1945-788X

    Conference

    Conference2022 IEEE International Conference on Multimedia and Expo, ICME 2022
    Country/TerritoryTaiwan
    CityTaipei
    Period18/07/2222/07/22

    Fingerprint

    Dive into the research topics of 'Spatial-Temporal-Class Attention Network for Acoustic Scene Classification'. Together they form a unique fingerprint.

    Cite this