ﻻ يوجد ملخص باللغة العربية
In this paper, we presents a low-complexity deep learning frameworks for acoustic scene classification (ASC). The proposed framework can be separated into three main steps: Front-end spectrogram extraction, back-end classification, and late fusion of predicted probabilities. First, we use Mel filter, Gammatone filter and Constant Q Transfrom (CQT) to transform raw audio signal into spectrograms, where both frequency and temporal features are presented. Three spectrograms are then fed into three individual back-end convolutional neural networks (CNNs), classifying into ten urban scenes. Finally, a late fusion of three predicted probabilities obtained from three CNNs is conducted to achieve the final classification result. To reduce the complexity of our proposed CNN network, we apply two model compression techniques: model restriction and decomposed convolution. Our extensive experiments, which are conducted on DCASE 2021 (IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events) Task 1A development dataset, achieve a low-complexity CNN based framework with 128 KB trainable parameters and the best classification accuracy of 66.7%, improving DCASE baseline by 19.0%
This thesis focuses on dealing with the task of acoustic scene classification (ASC), and then applied the techniques developed for ASC to a real-life application of detecting respiratory disease. To deal with ASC challenges, this thesis addresses thr
In this paper, we present deep learning frameworks for audio-visual scene classification (SC) and indicate how individual visual and audio features as well as their combination affect SC performance. Our extensive experiments, which are conducted on
Convolutional neural networks (CNNs) with log-mel spectrum features have shown promising results for acoustic scene classification tasks. However, the performance of these CNN based classifiers is still lacking as they do not generalise well for unkn
This paper describes an acoustic scene classification method which achieved the 4th ranking result in the IEEE AASP challenge of Detection and Classification of Acoustic Scenes and Events 2016. In order to accomplish the ensuing task, several methods
In a recent acoustic scene classification (ASC) research field, training and test device channel mismatch have become an issue for the real world implementation. To address the issue, this paper proposes a channel domain conversion using factorized h