ترغب بنشر مسار تعليمي؟ اضغط هنا

A Better Good-Turing Estimator for Sequence Probabilities

187   0   0.0 ( 0 )
 نشر من قبل Aaron Wagner
 تاريخ النشر 2007
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We consider the problem of estimating the probability of an observed string drawn i.i.d. from an unknown distribution. The key feature of our study is that the length of the observed string is assumed to be of the same order as the size of the underlying alphabet. In this setting, many letters are unseen and the empirical distribution tends to overestimate the probability of the observed letters. To overcome this problem, the traditional approach to probability estimation is to use the classical Good-Turing estimator. We introduce a natural scaling model and use it to show that the Good-Turing sequence probability estimator is not consistent. We then introduce a novel sequence probability estimator that is indeed consistent under the natural scaling model.



قيم البحث

اقرأ أيضاً

Consider a channel ${bf Y}={bf X}+ {bf N}$ where ${bf X}$ is an $n$-dimensional random vector, and ${bf N}$ is a Gaussian vector with a covariance matrix ${bf mathsf{K}}_{bf N}$. The object under consideration in this paper is the conditional mean of ${bf X}$ given ${bf Y}={bf y}$, that is ${bf y} to E[{bf X}|{bf Y}={bf y}]$. Several identities in the literature connect $E[{bf X}|{bf Y}={bf y}]$ to other quantities such as the conditional variance, score functions, and higher-order conditional moments. The objective of this paper is to provide a unifying view of these identities. In the first part of the paper, a general derivative identity for the conditional mean is derived. Specifically, for the Markov chain ${bf U} leftrightarrow {bf X} leftrightarrow {bf Y}$, it is shown that the Jacobian of $E[{bf U}|{bf Y}={bf y}]$ is given by ${bf mathsf{K}}_{{bf N}}^{-1} {bf Cov} ( {bf X}, {bf U} | {bf Y}={bf y})$. In the second part of the paper, via various choices of ${bf U}$, the new identity is used to generalize many of the known identities and derive some new ones. First, a simple proof of the Hatsel and Nolte identity for the conditional variance is shown. Second, a simple proof of the recursive identity due to Jaffer is provided. Third, a new connection between the conditional cumulants and the conditional expectation is shown. In particular, it is shown that the $k$-th derivative of $E[X|Y=y]$ is the $(k+1)$-th conditional cumulant. The third part of the paper considers some applications. In a first application, the power series and the compositional inverse of $E[X|Y=y]$ are derived. In a second application, the distribution of the estimator error $(X-E[X|Y])$ is derived. In a third application, we construct consistent estimators (empirical Bayes estimators) of the conditional cumulants from an i.i.d. sequence $Y_1,...,Y_n$.
In this paper, time delay estimation techniques robust to narrowband interference (NBI) are proposed. Owing to the deluge of wireless signal interference these days, narrowband interference is a common problem for communication and positioning system s. To mitigate the effect of this narrow band interference, we propose a robust time delay estimator for a predetermined repeated synchronization signal in an NBI environment. We exploit an ensemble of average and sample covariance matrices to estimate the noise profile. In addition, to increase the detection probability, we suppress the variance of likelihood value by employing a von-Mises distribution in the time-delay estimator. Our proposed time delay estimator shows a better performance in an NBI environment compared to a typical time delay estimator.
214 - Liang Wu , Yiming Ding 2015
It is proposed a class of statistical estimators $hat H =(hat H_1, ldots, hat H_d)$ for the Hurst parameters $H=(H_1, ldots, H_d)$ of fractional Brownian field via multi-dimensional wavelet analysis and least squares, which are asymptotically normal. These estimators can be used to detect self-similarity and long-range dependence in multi-dimensional signals, which is important in texture classification and improvement of diffusion tensor imaging (DTI) of nuclear magnetic resonance (NMR). Some fractional Brownian sheets will be simulated and the simulated data are used to validate these estimators. We find that when $H_i geq 1/2$, the estimators are efficient, and when $H_i < 1/2$, there are some bias.
In this paper, we develop new fast and efficient algorithms for designing single/multiple unimodular waveforms/codes with good auto- and cross-correlation or weighted correlation properties, which are highly desired in radar and communication systems . The waveform design is based on the minimization of the integrated sidelobe level (ISL) and weighted ISL (WISL) of waveforms. As the corresponding optimization problems can quickly grow to large scale with increasing the code length and number of waveforms, the main issue turns to be the development of fast large-scale optimization techniques. The difficulty is also that the corresponding optimization problems are non-convex, but the required accuracy is high. Therefore, we formulate the ISL and WISL minimization problems as non-convex quartic optimization problems in frequency domain, and then simplify them into quadratic problems by utilizing the majorization-minimization technique, which is one of the basic techniques for addressing large-scale and/or non-convex optimization problems. While designing our fast algorithms, we find out and use inherent algebraic structures in the objective functions to rewrite them into quartic forms, and in the case of WISL minimization, to derive additionally an alternative quartic form which allows to apply the quartic-quadratic transformation. Our algorithms are applicable to large-scale unimodular waveform design problems as they are proved to have lower or comparable computational burden (analyzed theoretically) and faster convergence speed (confirmed by comprehensive simulations) than the state-of-the-art algorithms. In addition, the waveforms designed by our algorithms demonstrate better correlation properties compared to their counterparts.
The problem of finding good linear codes for joint source-channel coding (JSCC) is investigated in this paper. By the code-spectrum approach, it has been proved in the authors previous paper that a good linear code for the authors JSCC scheme is a co de with a good joint spectrum, so the main task in this paper is to construct linear codes with good joint spectra. First, the code-spectrum approach is developed further to facilitate the calculation of spectra. Second, some general principles for constructing good linear codes are presented. Finally, we propose an explicit construction of linear codes with good joint spectra based on low density parity check (LDPC) codes and low density generator matrix (LDGM) codes.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا