ﻻ يوجد ملخص باللغة العربية
In this work, we propose a novel self-attention based neural network for robust multi-speaker localization from Ambisonics recordings. Starting from a state-of-the-art convolutional recurrent neural network, we investigate the benefit of replacing the recurrent layers by self-attention encoders, inherited from the Transformer architecture. We evaluate these models on synthetic and real-world data, with up to 3 simultaneous speakers. The obtained results indicate that the majority of the proposed architectures either perform on par, or outperform the CRNN baseline, especially in the multisource scenario. Moreover, by avoiding the recurrent layers, the proposed models lend themselves to parallel computing, which is shown to produce considerable savings in execution time.
Speaker counting is the task of estimating the number of people that are simultaneously speaking in an audio recording. For several audio processing tasks such as speaker diarization, separation, localization and tracking, knowing the number of speak
Large performance degradation is often observed for speaker ver-ification systems when applied to a new domain dataset. Givenan unlabeled target-domain dataset, unsupervised domain adaptation(UDA) methods, which usually leverage adversarial training
This document describes our submission to the 2018 LOCalization And TrAcking (LOCATA) challenge (Tasks 1, 3, 5). We estimate the 3D position of a speaker using the Global Coherence Field (GCF) computed from multiple microphone pairs of a DICIT planar
Identification and localization of sounds are both integral parts of computational auditory scene analysis. Although each can be solved separately, the goal of forming coherent auditory objects and achieving a comprehensive spatial scene understandin
In real-life applications, the performance of speaker recognition systems always degrades when there is a mismatch between training and evaluation data. Many domain adaptation methods have been successfully used for eliminating the domain mismatches