ﻻ يوجد ملخص باللغة العربية
Weakly Labelled learning has garnered lot of attention in recent years due to its potential to scale Sound Event Detection (SED) and is formulated as Multiple Instance Learning (MIL) problem. This paper proposes a Multi-Task Learning (MTL) framework for learning from Weakly Labelled Audio data which encompasses the traditional MIL setup. To show the utility of proposed framework, we use the input TimeFrequency representation (T-F) reconstruction as the auxiliary task. We show that the chosen auxiliary task de-noises internal T-F representation and improves SED performance under noisy recordings. Our second contribution is introducing two step Attention Pooling mechanism. By having 2-steps in attention mechanism, the network retains better T-F level information without compromising SED performance. The visualisation of first step and second step attention weights helps in localising the audio-event in T-F domain. For evaluating the proposed framework, we remix the DCASE 2019 task 1 acoustic scene data with DCASE 2018 Task 2 sounds event data under 0, 10 and 20 db SNR resulting in a multi-class Weakly labelled SED problem. The proposed total framework outperforms existing benchmark models over all SNRs, specifically 22.3 %, 12.8 %, 5.9 % improvement over benchmark model on 0, 10 and 20 dB SNR respectively. We carry out ablation study to determine the contribution of each auxiliary task and 2-step Attention Pooling to the SED performance improvement. The code is publicly released
Access to large corpora with strongly labelled sound events is expensive and difficult in engineering applications. Much research turns to address the problem of how to detect both the types and the timestamps of sound events with weak labels that on
Robust voice activity detection (VAD) is a challenging task in low signal-to-noise (SNR) environments. Recent studies show that speech enhancement is helpful to VAD, but the performance improvement is limited. To address this issue, here we propose a
While multitask and transfer learning has shown to improve the performance of neural networks in limited data settings, they require pretraining of the model on large datasets beforehand. In this paper, we focus on improving the performance of weakly
Automatic speaker verification systems are vulnerable to audio replay attacks which bypass security by replaying recordings of authorized speakers. Replay attack detection (RA) detection systems built upon Residual Neural Networks (ResNet)s have yiel
The weakly supervised sound event detection problem is the task of predicting the presence of sound events and their corresponding starting and ending points in a weakly labeled dataset. A weak dataset associates each training sample (a short recordi