Phase-Optimized K-SVD for Signal Extraction from Underdetermined Multichannel Sparse Mixtures

28 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Antoine Deleforge

تاريخ النشر 2014

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Antoine Deleforge - Walter Kellermann

أنظمة الصوت في الحاسوب

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We propose a novel sparse representation for heavily underdetermined multichannel sound mixtures, i.e., with much more sources than microphones. The proposed approach operates in the complex Fourier domain, thus preserving spatial characteristics carried by phase differences. We derive a generalization of K-SVD which jointly estimates a dictionary capturing both spectral and spatial features, a sparse activation matrix, and all instantaneous source phases from a set of signal examples. The dictionary can then be used to extract the learned signal from a new input mixture. The method is applied to the challenging problem of ego-noise reduction for robot audition. We demonstrate its superiority relative to conventional dictionary-based techniques using recordings made in a real room.

قيم البحث

82 - Nobutaka Ito , Rintaro Ikeshita , Hiroshi Sawada 2021

This paper presents a computationally efficient approach to blind source separation (BSS) of audio signals, applicable even when there are more sources than microphones (i.e., the underdetermined case). When there are as many sources as microphones ( i.e., the determined case), BSS can be performed computationally efficiently by independent component analysis (ICA). Unfortunately, however, ICA is basically inapplicable to the underdetermined case. Another BSS approach using the multichannel Wiener filter (MWF) is applicable even to this case, and encompasses full-rank spatial covariance analysis (FCA) and multichannel non-negative matrix factorization (MNMF). However, these methods require massive numbers of matrix

أنظمة الصوت في الحاسوب معالجة الصوت والكلام

Multichannel audio signal source separation based on an Interchannel Loudness Vector Sum

58 - Taejin Park , Taejin Lee 2015

In this paper, a Blind Source Separation (BSS) algorithm for multichannel audio contents is proposed. Unlike common BSS algorithms targeting stereo audio contents or microphone array signals, our technique is targeted at multichannel audio such as 5. 1 and 7.1ch audio. Since most multichannel audio object sources are panned using the Inter-channel Loudness Difference (ILD), we employ the ILVS (Inter-channel Loudness Vector Sum) concept to cluster common signals (such as background music) from each channel. After separating the common signals from each channel, we employ an Expectation Maximization (EM) algorithm with a von-Mises distribution to successfully classify the clustering of sound source objects and separate the audio signals from the original mixture. Our proposed method can therefore separate common audio signals and object source signals from multiple channels with reasonable quality. Our multichannel audio content separation technique can be applied to an upmix system or a cinema audio system requiring multichannel audio source separation.

أنظمة الصوت في الحاسوب

Phase Unmixing : Multichannel Source Separation with Magnitude Constraints

108 - Antoine Deleforge 2016

We consider the problem of estimating the phases of K mixed complex signals from a multichannel observation, when the mixing matrix and signal magnitudes are known. This problem can be cast as a non-convex quadratically constrained quadratic program which is known to be NP-hard in general. We propose three approaches to tackle it: a heuristic method, an alternate minimization method, and a convex relaxation into a semi-definite program. The last two approaches are showed to outperform the oracle multichannel Wiener filter in under-determined informed source separation tasks, using simulated and speech signals. The convex relaxation approach yields best results, including the potential for exact source separation in under-determined settings.

أنظمة الصوت في الحاسوب التعلم الالي

Sparse Coresets for SVD on Infinite Streams

327 - Vladimir Braverman , Dan Feldman , Harry Lang 2020

In streaming Singular Value Decomposition (SVD), $d$-dimensional rows of a possibly infinite matrix arrive sequentially as points in $mathbb{R}^d$. An $epsilon$-coreset is a (much smaller) matrix whose sum of square distances of the rows to any hyper plane approximates that of the original matrix to a $1 pm epsilon$ factor. Our main result is that we can maintain a $epsilon$-coreset while storing only $O(d log^2 d / epsilon^2)$ rows. Known lower bounds of $Omega(d / epsilon^2)$ rows show that this is nearly optimal. Moreover, each row of our coreset is a weighted subset of the input rows. This is highly desirable since it: (1) preserves sparsity; (2) is easily interpretable; (3) avoids precision errors; (4) applies to problems with constraints on the input. Previous streaming results for SVD that return a subset of the input required storing $Omega(d log^3 n / epsilon^2)$ rows where $n$ is the number of rows seen so far. Our algorithm, with storage independent of $n$, is the first result that uses finite memory on infinite streams. We support our findings with experiments on the Wikipedia dataset benchmarked against state-of-the-art algorithms.

بنى وهياكل البيانات والخوارزميات

Multichannel CRNN for Speaker Counting: an Analysis of Performance

235 - Pierre-Amaury Grumiaux , Srdan Kitic , Laurent Girin 2021

Speaker counting is the task of estimating the number of people that are simultaneously speaking in an audio recording. For several audio processing tasks such as speaker diarization, separation, localization and tracking, knowing the number of speak ers at each timestep is a prerequisite, or at least it can be a strong advantage, in addition to enabling a low latency processing. In a previous work, we addressed the speaker counting problem with a multichannel convolutional recurrent neural network which produces an estimation at a short-term frame resolution. In this work, we show that, for a given frame, there is an optimal position in the input sequence for best prediction accuracy. We empirically demonstrate the link between that optimal position, the length of the input sequence and the size of the convolutional filters.

أنظمة الصوت في الحاسوب معالجة الصوت والكلام

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

المعهد الوطني للإدارة العامة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Phase-Optimized K-SVD for Signal Extraction from Underdetermined Multichannel Sparse Mixtures

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً