Do you want to publish a course? Click here

Using data-compressors for statistical analysis of problems on homogeneity testing and classification

55   0   0.0 ( 0 )
 Added by Boris Ryabko
 Publication date 2017
and research's language is English




Ask ChatGPT about the research

Nowadays data compressors are applied to many problems of text analysis, but many such applications are developed outside of the framework of mathematical statistics. In this paper we overcome this obstacle and show how several methods of classical mathematical statistics can be developed based on applications of the data compressors.



rate research

Read More

72 - Tomohiro Nishiyama 2020
The divergence minimization problem plays an important role in various fields. In this note, we focus on differentiable and strictly convex divergences. For some minimization problems, we show the minimizer conditions and the uniqueness of the minimizer without assuming a specific form of divergences. Furthermore, we show geometric properties related to the minimization problems.
This paper presents results pertaining to sequential methods for support recovery of sparse signals in noise. Specifically, we show that any sequential measurement procedure fails provided the average number of measurements per dimension grows slower then log s / D(f0||f1) where s is the level of sparsity, and D(f0||f1) the Kullback-Leibler divergence between the underlying distributions. For comparison, we show any non-sequential procedure fails provided the number of measurements grows at a rate less than log n / D(f1||f0), where n is the total dimension of the problem. Lastly, we show that a simple procedure termed sequential thresholding guarantees exact support recovery provided the average number of measurements per dimension grows faster than (log s + log log n) / D(f0||f1), a mere additive factor more than the lower bound.
206 - Daniil Ryabko 2012
In this work a method for statistical analysis of time series is proposed, which is used to obtain solutions to some classical problems of mathematical statistics under the only assumption that the process generating the data is stationary ergodic. Namely, three problems are considered: goodness-of-fit (or identity) testing, process classification, and the change point problem. For each of the problems a test is constructed that is asymptotically accurate for the case when the data is generated by stationary ergodic processes. The tests are based on empirical estimates of distributional distance.
Minimization problems with respect to a one-parameter family of generalized relative entropies are studied. These relative entropies, which we term relative $alpha$-entropies (denoted $mathscr{I}_{alpha}$), arise as redundancies under mismatched compression when cumulants of compressed lengths are considered instead of expected compressed lengths. These parametric relative entropies are a generalization of the usual relative entropy (Kullback-Leibler divergence). Just like relative entropy, these relative $alpha$-entropies behave like squared Euclidean distance and satisfy the Pythagorean property. Minimizers of these relative $alpha$-entropies on closed and convex sets are shown to exist. Such minimizations generalize the maximum R{e}nyi or Tsallis entropy principle. The minimizing probability distribution (termed forward $mathscr{I}_{alpha}$-projection) for a linear family is shown to obey a power-law. Other results in connection with statistical inference, namely subspace transitivity and iterated projections, are also established. In a companion paper, a related minimization problem of interest in robust statistics that leads to a reverse $mathscr{I}_{alpha}$-projection is studied.
The problem of verifying whether a multi-component system has anomalies or not is addressed. Each component can be probed over time in a data-driven manner to obtain noisy observations that indicate whether the selected component is anomalous or not. The aim is to minimize the probability of incorrectly declaring the system to be free of anomalies while ensuring that the probability of correctly declaring it to be safe is sufficiently large. This problem is modeled as an active hypothesis testing problem in the Neyman-Pearson setting. Component-selection and inference strategies are designed and analyzed in the non-asymptotic regime. For a specific class of homogeneous problems, stronger (with respect to prior work) non-asymptotic converse and achievability bounds are provided.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا