ترغب بنشر مسار تعليمي؟ اضغط هنا

Phase space methods and psychoacoustic models in lossy transform coding

45   0   0.0 ( 0 )
 نشر من قبل Matthew Cargo
 تاريخ النشر 2007
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

I present a method for lossy transform coding of digital audio that uses the Weyl symbol calculus for constructing the encoding and decoding transformation. The method establishes a direct connection between a time-frequency representation of the signal dependent threshold of masked noise and the encode/decode pair. The formalism also offers a time-frequency measure of perceptual entropy.

قيم البحث

اقرأ أيضاً

We review a class of methods that can be collected under the name nonlinear transform coding (NTC), which over the past few years have become competitive with the best linear transform codecs for images, and have superseded them in terms of rate--dis tortion performance under established perceptual quality metrics such as MS-SSIM. We assess the empirical rate--distortion performance of NTC with the help of simple example sources, for which the optimal performance of a vector quantizer is easier to estimate than with natural data sources. To this end, we introduce a novel variant of entropy-constrained vector quantization. We provide an analysis of various forms of stochastic optimization techniques for NTC models; review architectures of transforms based on artificial neural networks, as well as learned entropy models; and provide a direct comparison of a number of methods to parameterize the rate--distortion trade-off of nonlinear transforms, introducing a simplified one.
We consider the phase retrieval problem, in which the observer wishes to recover a $n$-dimensional real or complex signal $mathbf{X}^star$ from the (possibly noisy) observation of $|mathbf{Phi} mathbf{X}^star|$, in which $mathbf{Phi}$ is a matrix of size $m times n$. We consider a emph{high-dimensional} setting where $n,m to infty$ with $m/n = mathcal{O}(1)$, and a large class of (possibly correlated) random matrices $mathbf{Phi}$ and observation channels. Spectral methods are a powerful tool to obtain approximate observations of the signal $mathbf{X}^star$ which can be then used as initialization for a subsequent algorithm, at a low computational cost. In this paper, we extend and unify previous results and approaches on spectral methods for the phase retrieval problem. More precisely, we combine the linearization of message-passing algorithms and the analysis of the emph{Bethe Hessian}, a classical tool of statistical physics. Using this toolbox, we show how to derive optimal spectral methods for arbitrary channel noise and right-unitarily invariant matrix $mathbf{Phi}$, in an automated manner (i.e. with no optimization over any hyperparameter or preprocessing function).
We consider communication over a noisy network under randomized linear network coding. Possible error mechanism include node- or link- failures, Byzantine behavior of nodes, or an over-estimate of the network min-cut. Building on the work of Koetter and Kschischang, we introduce a probabilistic model for errors. We compute the capacity of this channel and we define an error-correction scheme based on random sparse graphs and a low-complexity decoding algorithm. By optimizing over the code degree profile, we show that this construction achieves the channel capacity in complexity which is jointly quadratic in the number of coded information bits and sublogarithmic in the error probability.
Compared with automatic speech recognition (ASR), the human auditory system is more adept at handling noise-adverse situations, including environmental noise and channel distortion. To mimic this adeptness, auditory models have been widely incorporat ed in ASR systems to improve their robustness. This paper proposes a novel auditory model which incorporates psychoacoustics and otoacoustic emissions (OAEs) into ASR. In particular, we successfully implement the frequency-dependent property of psychoacoustic models and effectively improve resulting system performance. We also present a novel double-transform spectrum-analysis technique, which can qualitatively predict ASR performance for different noise types. Detailed theoretical analysis is provided to show the effectiveness of the proposed algorithm. Experiments are carried out on the AURORA2 database and show that the word recognition rate using our proposed feature extraction method is significantly increased over the baseline. Given models trained with clean speech, our proposed method achieves up to 85.39% word recognition accuracy on noisy data.
284 - Wentao Huang , Kechen Zhang 2016
While Shannons mutual information has widespread applications in many disciplines, for practical applications it is often difficult to calculate its value accurately for high-dimensional variables because of the curse of dimensionality. This paper is focused on effective approximation methods for evaluating mutual information in the context of neural population coding. For large but finite neural populations, we derive several information-theoretic asymptotic bounds and approximation formulas that remain valid in high-dimensional spaces. We prove that optimizing the population density distribution based on these approximation formulas is a convex optimization problem which allows efficient numerical solutions. Numerical simulation results confirmed that our asymptotic formulas were highly accurate for approximating mutual information for large neural populations. In special cases, the approximation formulas are exactly equal to the true mutual information. We also discuss techniques of variable transformation and dimensionality reduction to facilitate computation of the approximations.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا