No Arabic abstract
Non-Gaussian component analysis (NGCA) is a problem in multidimensional data analysis which, since its formulation in 2006, has attracted considerable attention in statistics and machine learning. In this problem, we have a random variable $X$ in $n$-dimensional Euclidean space. There is an unknown subspace $Gamma$ of the $n$-dimensional Euclidean space such that the orthogonal projection of $X$ onto $Gamma$ is standard multidimensional Gaussian and the orthogonal projection of $X$ onto $Gamma^{perp}$, the orthogonal complement of $Gamma$, is non-Gaussian, in the sense that all its one-dimensional marginals are different from the Gaussian in a certain metric defined in terms of moments. The NGCA problem is to approximate the non-Gaussian subspace $Gamma^{perp}$ given samples of $X$. Vectors in $Gamma^{perp}$ correspond to `interesting directions, whereas vectors in $Gamma$ correspond to the directions where data is very noisy. The most interesting applications of the NGCA model is for the case when the magnitude of the noise is comparable to that of the true signal, a setting in which traditional noise reduction techniques such as PCA dont apply directly. NGCA is also related to dimension reduction and to other data analysis problems such as ICA. NGCA-like problems have been studied in statistics for a long time using techniques such as projection pursuit. We give an algorithm that takes polynomial time in the dimension $n$ and has an inverse polynomial dependence on the error parameter measuring the angle distance between the non-Gaussian subspace and the subspace output by the algorithm. Our algorithm is based on relative entropy as the contrast function and fits under the projection pursuit framework. The techniques we develop for analyzing our algorithm maybe of use for other related problems.
The entropy of a quantum system is a measure of its randomness, and has applications in measuring quantum entanglement. We study the problem of measuring the von Neumann entropy, $S(rho)$, and Renyi entropy, $S_alpha(rho)$ of an unknown mixed quantum state $rho$ in $d$ dimensions, given access to independent copies of $rho$. We provide an algorithm with copy complexity $O(d^{2/alpha})$ for estimating $S_alpha(rho)$ for $alpha<1$, and copy complexity $O(d^{2})$ for estimating $S(rho)$, and $S_alpha(rho)$ for non-integral $alpha>1$. These bounds are at least quadratic in $d$, which is the order dependence on the number of copies required for learning the entire state $rho$. For integral $alpha>1$, on the other hand, we provide an algorithm for estimating $S_alpha(rho)$ with a sub-quadratic copy complexity of $O(d^{2-2/alpha})$. We characterize the copy complexity for integral $alpha>1$ up to constant factors by providing matching lower bounds. For other values of $alpha$, and the von Neumann entropy, we show lower bounds on the algorithm that achieves the upper bound. This shows that we either need new algorithms for better upper bounds, or better lower bounds to tighten the results. For non-integral $alpha$, and the von Neumann entropy, we consider the well known Empirical Young Diagram (EYD) algorithm, which is the analogue of empirical plug-in estimator in classical distribution estimation. As a corollary, we strengthen a lower bound on the copy complexity of the EYD algorithm for learning the maximally mixed state by showing that the lower bound holds with exponential probability (which was previously known to hold with a constant probability). For integral $alpha>1$, we provide new concentration results of certain polynomials that arise in Kerov algebra of Young diagrams.
Functional principal component analysis (FPCA) could become invalid when data involve non-Gaussian features. Therefore, we aim to develop a general FPCA method to adapt to such non-Gaussian cases. A Kenalls $tau$ function, which possesses identical eigenfunctions as covariance function, is constructed. The particular formulation of Kendalls $tau$ function makes it less insensitive to data distribution. We further apply it to the estimation of FPCA and study the corresponding asymptotic consistency. Moreover, the effectiveness of the proposed method is demonstrated through a comprehensive simulation study and an application to the physical activity data collected by a wearable accelerometer monitor.
Functional principal component analysis is essential in functional data analysis, but the inferences will become unconvincing when some non-Gaussian characteristics occur, such as heavy tail and skewness. The focus of this paper is to develop a robust functional principal component analysis methodology in dealing with non-Gaussian longitudinal data, for which sparsity and irregularity along with non-negligible measurement errors must be considered. We introduce a Kendalls $tau$ function whose particular properties make it a nice proxy for the covariance function in the eigenequation when handling non-Gaussian cases. Moreover, the estimation procedure is presented and the asymptotic theory is also established. We further demonstrate the superiority and robustness of our method through simulation studies and apply the method to the longitudinal CD4 cell count data in an AIDS study.
We revisit the problem of estimating the mean of a real-valued distribution, presenting a novel estimator with sub-Gaussian convergence: intuitively, our estimator, on any distribution, is as accurate as the sample mean is for the Gaussian distribution of matching variance. Crucially, in contrast to prior works, our estimator does not require prior knowledge of the variance, and works across the entire gamut of distributions with bounded variance, including those without any higher moments. Parameterized by the sample size $n$, the failure probability $delta$, and the variance $sigma^2$, our estimator is accurate to within $sigmacdot(1+o(1))sqrt{frac{2logfrac{1}{delta}}{n}}$, tight up to the $1+o(1)$ factor. Our estimator construction and analysis gives a framework generalizable to other problems, tightly analyzing a sum of dependent random variables by viewing the sum implicitly as a 2-parameter $psi$-estimator, and constructing bounds using mathematical programming and duality techniques.
Performance of nuclear threat detection systems based on gamma-ray spectrometry often strongly depends on the ability to identify the part of measured signal that can be attributed to background radiation. We have successfully applied a method based on Principal Component Analysis (PCA) to obtain a compact null-space model of background spectra using PCA projection residuals to derive a source detection score. We have shown the methods utility in a threat detection system using mobile spectrometers in urban scenes (Tandon et al 2012). While it is commonly assumed that measured photon counts follow a Poisson process, standard PCA makes a Gaussian assumption about the data distribution, which may be a poor approximation when photon counts are low. This paper studies whether and in what conditions PCA with a Poisson-based loss function (Poisson PCA) can outperform standard Gaussian PCA in modeling background radiation to enable more sensitive and specific nuclear threat detection.