ﻻ يوجد ملخص باللغة العربية
From a machine learning perspective, the human ability localize sounds can be modeled as a non-parametric and non-linear regression problem between binaural spectral features of sound received at the ears (input) and their sound-source directions (output). The input features can be summarized in terms of the individuals head-related transfer functions (HRTFs) which measure the spectral response between the listeners eardrum and an external point in $3$D. Based on these viewpoints, two related problems are considered: how can one achieve an optimal sampling of measurements for training sound-source localization (SSL) models, and how can SSL models be used to infer the subjects HRTFs in listening tests. First, we develop a class of binaural SSL models based on Gaussian process regression and solve a emph{forward selection} problem that finds a subset of input-output samples that best generalize to all SSL directions. Second, we use an emph{active-learning} approach that updates an online SSL model for inferring the subjects SSL errors via headphones and a graphical user interface. Experiments show that only a small fraction of HRTFs are required for $5^{circ}$ localization accuracy and that the learned HRTFs are localized closer to their intended directions than non-individualized HRTFs.
This article is a survey on deep learning methods for single and multiple sound source localization. We are particularly interested in sound source localization in indoor/domestic environment, where reverberation and diffuse noise are present. We pro
In this work, we propose to extend a state-of-the-art multi-source localization system based on a convolutional recurrent neural network and Ambisonics signals. We significantly improve the performance of the baseline network by changing the layout b
Sound event localization aims at estimating the positions of sound sources in the environment with respect to an acoustic receiver (e.g. a microphone array). Recent advances in this domain most prominently focused on utilizing deep recurrent neural n
In this paper, we describe our method for DCASE2019 task3: Sound Event Localization and Detection (SELD). We use four CRNN SELDnet-like single output models which run in a consecutive manner to recover all possible information of occurring events. We
We investigate active learning in Gaussian Process state-space models (GPSSM). Our problem is to actively steer the system through latent states by determining its inputs such that the underlying dynamics can be optimally learned by a GPSSM. In order