ﻻ يوجد ملخص باللغة العربية
Weakly labelled audio tagging aims to predict the classes of sound events within an audio clip, where the onset and offset times of the sound events are not provided. Previous works have used the multiple instance learning (MIL) framework, and exploited the information of the whole audio clip by MIL pooling functions. However, the detailed information of sound events such as their durations may not be considered under this framework. To address this issue, we propose a novel two-stream framework for audio tagging by exploiting the global and local information of sound events. The global stream aims to analyze the whole audio clip in order to capture the local clips that need to be attended using a class-wise selection module. These clips are then fed to the local stream to exploit the detailed information for a better decision. Experimental results on the AudioSet show that our proposed method can significantly improve the performance of audio tagging under different baseline network architectures.
This paper proposes a network architecture mainly designed for audio tagging, which can also be used for weakly supervised acoustic event detection (AED). The proposed network consists of a modified DenseNet as the feature extractor, and a global ave
Knowledge Distillation (KD) is a popular area of research for reducing the size of large models while still maintaining good performance. The outputs of larger teacher models are used to guide the training of smaller student models. Given the repetit
With the development of deep learning and artificial intelligence, audio synthesis has a pivotal role in the area of machine learning and shows strong applicability in the industry. Meanwhile, significant efforts have been dedicated by researchers to
Many name tagging approaches use local contextual information with much success, but fail when the local context is ambiguous or limited. We present a new framework to improve name tagging by utilizing local, document-level, and corpus-level contextu
Acoustic scene classification systems using deep neural networks classify given recordings into pre-defined classes. In this study, we propose a novel scheme for acoustic scene classification which adopts an audio tagging system inspired by the human