No Arabic abstract
We consider the searching for a trail in a maze composite hypothesis testing problem, in which one attempts to detect an anomalous directed path in a lattice 2D box of side n based on observations on the nodes of the box. Under the signal hypothesis, one observes independent Gaussian variables of unit variance at all nodes, with zero, mean off the anomalous path and mean mu_n on it. Under the null hypothesis, one observes i.i.d. standard Gaussians on all nodes. Arias-Castro et al. (2008) showed that if the unknown directed path under the signal hypothesis has known the initial location, then detection is possible (in the minimax sense) if mu_n >> 1/sqrt log n, while it is not possible if mu_n << 1/ log nsqrt log log n. In this paper, we show that this result continues to hold even when the initial location of the unknown path is not known. As is the case with Arias-Castro et al. (2008), the upper bound here also applies when the path is undirected. The improvement is achieved by replacing the linear detection statistic used in Arias-Castro et al. (2008) with a polynomial statistic, which is obtained by employing a multi-scale analysis on a quadratic statistic to bootstrap its performance. Our analysis is motivated by ideas developed in the context of the analysis of random polymers in Lacoin (2010).
This paper explores a class of empirical Bayes methods for level-dependent threshold selection in wavelet shrinkage. The prior considered for each wavelet coefficient is a mixture of an atom of probability at zero and a heavy-tailed density. The mixing weight, or sparsity parameter, for each level of the transform is chosen by marginal maximum likelihood. If estimation is carried out using the posterior median, this is a random thresholding procedure; the estimation can also be carried out using other thresholding rules with the same threshold. Details of the calculations needed for implementing the procedure are included. In practice, the estimates are quick to compute and there is software available. Simulations on the standard model functions show excellent performance, and applications to data drawn from various fields of application are used to explore the practical performance of the approach. By using a general result on the risk of the corresponding marginal maximum likelihood approach for a single sequence, overall bounds on the risk of the method are found subject to membership of the unknown function in one of a wide range of Besov classes, covering also the case of f of bounded variation. The rates obtained are optimal for any value of the parameter p in (0,infty], simultaneously for a wide range of loss functions, each dominating the L_q norm of the sigmath derivative, with sigmage0 and 0<qle2.
Environments with immobile obstacles or void regions that inhibit and alter the motion of individuals within that environment are ubiquitous. Correlation in the location of individuals within such environments arises as a combination of the mechanisms governing individual behavior and the heterogeneous structure of the environment. Measures of spatial structure and correlation have been successfully implemented to elucidate the roles of the mechanisms underpinning the behavior of individuals. In particular, the pair correlation function has been used across biology, ecology and physics to obtain quantitative insight into a variety of processes. However, naively applying standard pair correlation functions in the presence of obstacles may fail to detect correlation, or suggest false correlations, due to a reliance on a distance metric that does not account for obstacles. To overcome this problem, here we present an analytic expression for calculating a corrected pair correlation function for lattice-based domains containing obstacles. We demonstrate that this corrected pair correlation function is necessary for isolating the correlation associated with the behavior of individuals, rather than the structure of the environment. Using simulations that mimic cell migration and proliferation we demonstrate that the corrected pair correlation function recovers the short-range correlation known to be present in this process, independent of the heterogeneous structure of the environment. Further, we show that the analytic calculation of the corrected pair correlation derived here is significantly faster to implement than the corresponding numerical approach.
Data observed at high sampling frequency are typically assumed to be an additive composite of a relatively slow-varying continuous-time component, a latent stochastic process or a smooth random function, and measurement error. Supposing that the latent component is an It^{o} diffusion process, we propose to estimate the measurement error density function by applying a deconvolution technique with appropriate localization. Our estimator, which does not require equally-spaced observed times, is consistent and minimax rate optimal. We also investigate estimators of the moments of the error distribution and their properties, propose a frequency domain estimator for the integrated volatility of the underlying stochastic process, and show that it achieves the optimal convergence rate. Simulations and a real data analysis validate our analysis.
A new vision in multidimensional statistics is proposed impacting severalareas of application. In these applications, a set of noisy measurementscharacterizing the repeatable response of a process is known as a realizationand can be seen as a single point in $mathbb{R}^N$. The projections of thispoint on the N axes correspond to the N measurements. The contemporary visionof a diffuse cloud of realizations distributed in $mathbb{R}^N$ is replaced bya cloud in the shape of a shell surrounding a topological manifold. Thismanifold corresponds to the processs stabilized-response domain observedwithout the measurement noise. The measurement noise, which accumulates overseveral dimensions, distances each realization from the manifold. Theprobability density function (PDF) of the realization-to-manifold distancecreates the shell. Considering the central limit theorem as the number ofdimensions increases, the PDF tends toward the normal distribution N($mu$,$sigma$^2) where $mu$ fixes the center shell location and $sigma$fixes the shell thickness. In vision, the likelihood of a realization is afunction of the realization-to-shell distance rather than therealization-to-manifold distance. The demonstration begins with the work ofClaude Shannon followed by the introduction of the shell manifold and ends withpractical applications to monitoring equipment.
We consider the statistical inference for noisy incomplete 1-bit matrix. Instead of observing a subset of real-valued entries of a matrix M, we only have one binary (1-bit) measurement for each entry in this subset, where the binary measurement follows a Bernoulli distribution whose success probability is determined by the value of the entry. Despite the importance of uncertainty quantification to matrix completion, most of the categorical matrix completion literature focus on point estimation and prediction. This paper moves one step further towards the statistical inference for 1-bit matrix completion. Under a popular nonlinear factor analysis model, we obtain a point estimator and derive its asymptotic distribution for any linear form of M and latent factor scores. Moreover, our analysis adopts a flexible missing-entry design that does not require a random sampling scheme as required by most of the existing asymptotic results for matrix completion. The proposed estimator is statistically efficient and optimal, in the sense that the Cramer-Rao lower bound is achieved asymptotically for the model parameters. Two applications are considered, including (1) linking two forms of an educational test and (2) linking the roll call voting records from multiple years in the United States senate. The first application enables the comparison between examinees who took different test forms, and the second application allows us to compare the liberal-conservativeness of senators who did not serve in the senate at the same time.