ترغب بنشر مسار تعليمي؟ اضغط هنا

KARMA: Kalman-based autoregressive moving average modeling and inference for formant and antiformant tracking

136   0   0.0 ( 0 )
 نشر من قبل Patrick J. Wolfe
 تاريخ النشر 2011
والبحث باللغة English




اسأل ChatGPT حول البحث

Vocal tract resonance characteristics in acoustic speech signals are classically tracked using frame-by-frame point estimates of formant frequencies followed by candidate selection and smoothing using dynamic programming methods that minimize ad hoc cost functions. The goal of the current work is to provide both point estimates and associated uncertainties of center frequencies and bandwidths in a statistically principled state-space framework. Extended Kalman (K) algorithms take advantage of a linearized mapping to infer formant and antiformant parameters from frame-based estimates of autoregressive moving average (ARMA) cepstral coefficients. Error analysis of KARMA, WaveSurfer, and Praat is accomplished in the all-pole case using a manually marked formant database and synthesized speech waveforms. KARMA formant tracks exhibit lower overall root-mean-square error relative to the two benchmark algorithms, with third formant tracking more challenging. Antiformant tracking performance of KARMA is illustrated using synthesized and spoken nasal phonemes. The simultaneous tracking of uncertainty levels enables practitioners to recognize time-varying confidence in parameters of interest and adjust algorithmic settings accordingly.



قيم البحث

اقرأ أيضاً

This paper presents a case study on short-term load forecasting for France, with emphasis on special days, such as public holidays. We investigate the generalisability to French data of a recently proposed approach, which generates forecasts for norm al and special days in a coherent and unified framework, by incorporating subjective judgment in univariate statistical models using a rule-based methodology. The intraday, intraweek, and intrayear seasonality in load are accommodated using a rule-based triple seasonal adaptation of a seasonal autoregressive moving average (SARMA) model. We find that, for application to French load, the method requires an important adaption. We also adapt a recently proposed SARMA model that accommodates special day effects on an hourly basis using indicator variables. Using a rule formulated specifically for the French load, we compare the SARMA models with a range of different benchmark methods based on an evaluation of their point and density forecast accuracy. As sophisticated benchmarks, we employ the rule-based triple seasonal adaptations of Holt-Winters-Taylor (HWT) exponential smoothing and artificial neural networks (ANNs). We use nine years of half-hourly French load data, and consider lead times ranging from one half-hour up to a day ahead. The rule-based SARMA approach generated the most accurate forecasts.
In the field of signal processing on graphs, graph filters play a crucial role in processing the spectrum of graph signals. This paper proposes two different strategies for designing autoregressive moving average (ARMA) graph filters on both directed and undirected graphs. The first approach is inspired by Pronys method, which considers a modified error between the modeled and the desired frequency response. The second technique is based on an iterative approach, which finds the filter coefficients by iteratively minimizing the true error (instead of the modified error) between the modeled and the desired frequency response. The performance of the proposed algorithms is evaluated and compared with finite impulse response (FIR) graph filters, on both synthetic and real data. The obtained results show that ARMA filters outperform FIR filters in terms of approximation accuracy and they are suitable for graph signal interpolation, compression and prediction.
Formant tracking is one of the most fundamental problems in speech processing. Traditionally, formants are estimated using signal processing methods. Recent studies showed that generic convolutional architectures can outperform recurrent networks on temporal tasks such as speech synthesis and machine translation. In this paper, we explored the use of Temporal Convolutional Network (TCN) for formant tracking. In addition to the conventional implementation, we modified the architecture from three aspects. First, we turned off the causal mode of dilated convolution, making the dilated convolution see the future speech frames. Second, each hidden layer reused the output information from all the previous layers through dense connection. Third, we also adopted a gating mechanism to alleviate the problem of gradient disappearance by selectively forgetting unimportant information. The model was validated on the open access formant database VTR. The experiment showed that our proposed model was easy to converge and achieved an overall mean absolute percent error (MAPE) of 8.2% on speech-labeled frames, compared to three competitive baselines of 9.4% (LSTM), 9.1% (Bi-LSTM) and 8.9% (TCN).
We present and validate a novel method for noise injection of arbitrary spectra in quantum circuits that can be applied to any system capable of executing arbitrary single qubit rotations, including cloud-based quantum processors. As the consequences of temporally-correlated noise on the performance of quantum algorithms are not well understood, the capability to engineer and inject such noise in quantum systems is paramount. To date, noise injection capabilities have been limited and highly platform specific, requiring low-level access to control hardware. We experimentally validate our universal method by comparing to a direct hardware-based noise-injection scheme, using a combination of quantum noise spectroscopy and classical signal analysis to show that the two approaches agree. These results showcase a highly versatile method for noise injection that can be utilized by theoretical and experimental researchers to verify, evaluate, and improve quantum characterization protocols and quantum algorithms for sensing and computing.
One of the core components in online multiple object tracking (MOT) frameworks is associating new detections with existing tracklets, typically done via a scoring function. Despite the great advances in MOT, designing a reliable scoring function rema ins a challenge. In this paper, we introduce a probabilistic autoregressive generative model to score tracklet proposals by directly measuring the likelihood that a tracklet represents natural motion. One key property of our model is its ability to generate multiple likely futures of a tracklet given partial observations. This allows us to not only score tracklets but also effectively maintain existing tracklets when the detector fails to detect some objects even for a long time, e.g., due to occlusion, by sampling trajectories so as to inpaint the gaps caused by misdetection. Our experiments demonstrate the effectiveness of our approach to scoring and inpainting tracklets on several MOT benchmark datasets. We additionally show the generality of our generative model by using it to produce future representations in the challenging task of human motion prediction.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا