ﻻ يوجد ملخص باللغة العربية
We propose a novel sparse representation for heavily underdetermined multichannel sound mixtures, i.e., with much more sources than microphones. The proposed approach operates in the complex Fourier domain, thus preserving spatial characteristics carried by phase differences. We derive a generalization of K-SVD which jointly estimates a dictionary capturing both spectral and spatial features, a sparse activation matrix, and all instantaneous source phases from a set of signal examples. The dictionary can then be used to extract the learned signal from a new input mixture. The method is applied to the challenging problem of ego-noise reduction for robot audition. We demonstrate its superiority relative to conventional dictionary-based techniques using recordings made in a real room.
This paper presents a computationally efficient approach to blind source separation (BSS) of audio signals, applicable even when there are more sources than microphones (i.e., the underdetermined case). When there are as many sources as microphones (
In this paper, a Blind Source Separation (BSS) algorithm for multichannel audio contents is proposed. Unlike common BSS algorithms targeting stereo audio contents or microphone array signals, our technique is targeted at multichannel audio such as 5.
We consider the problem of estimating the phases of K mixed complex signals from a multichannel observation, when the mixing matrix and signal magnitudes are known. This problem can be cast as a non-convex quadratically constrained quadratic program
In streaming Singular Value Decomposition (SVD), $d$-dimensional rows of a possibly infinite matrix arrive sequentially as points in $mathbb{R}^d$. An $epsilon$-coreset is a (much smaller) matrix whose sum of square distances of the rows to any hyper
Speaker counting is the task of estimating the number of people that are simultaneously speaking in an audio recording. For several audio processing tasks such as speaker diarization, separation, localization and tracking, knowing the number of speak