No Arabic abstract
This paper presents a detailed theoretical analysis of the three stochastic approximation proximal gradient algorithms proposed in our companion paper [49] to set regularization parameters by marginal maximum likelihood estimation. We prove the convergence of a more general stochastic approximation scheme that includes the three algorithms of [49] as special cases. This includes asymptotic and non-asymptotic convergence results with natural and easily verifiable conditions, as well as explicit bounds on the convergence rates. Importantly, the theory is also general in that it can be applied to other intractable optimisation problems. A main novelty of the work is that the stochastic gradient estimates of our scheme are constructed from inexact proximal Markov chain Monte Carlo samplers. This allows the use of samplers that scale efficiently to large problems and for which we have precise theoretical guarantees.
Nonparametric empirical Bayes methods provide a flexible and attractive approach to high-dimensional data analysis. One particularly elegant empirical Bayes methodology, involving the Kiefer-Wolfowitz nonparametric maximum likelihood estimator (NPMLE) for mixture models, has been known for decades. However, implementation and theoretical analysis of the Kiefer-Wolfowitz NPMLE are notoriously difficult. A fast algorithm was recently proposed that makes NPMLE-based procedures feasible for use in large-scale problems, but the algorithm calculates only an approximation to the NPMLE. In this paper we make two contributions. First, we provide upper bounds on the convergence rate of the approximate NPMLEs statistical error, which have the same order as the best known bounds for the true NPMLE. This suggests that the approximate NPMLE is just as effective as the true NPMLE for statistical applications. Second, we illustrate the promise of NPMLE procedures in a high-dimensional binary classification problem. We propose a new procedure and show that it vastly outperforms existing methods in experiments with simulated data. In real data analyses involving cancer survival and gene expression data, we show that it is very competitive with several recently proposed methods for regularized linear discriminant analysis, another popular approach to high-dimensional classification.
Consider a setting with $N$ independent individuals, each with an unknown parameter, $p_i in [0, 1]$ drawn from some unknown distribution $P^star$. After observing the outcomes of $t$ independent Bernoulli trials, i.e., $X_i sim text{Binomial}(t, p_i)$ per individual, our objective is to accurately estimate $P^star$. This problem arises in numerous domains, including the social sciences, psychology, health-care, and biology, where the size of the population under study is usually large while the number of observations per individual is often limited. Our main result shows that, in the regime where $t ll N$, the maximum likelihood estimator (MLE) is both statistically minimax optimal and efficiently computable. Precisely, for sufficiently large $N$, the MLE achieves the information theoretic optimal error bound of $mathcal{O}(frac{1}{t})$ for $t < clog{N}$, with regards to the earth movers distance (between the estimated and true distributions). More generally, in an exponentially large interval of $t$ beyond $c log{N}$, the MLE achieves the minimax error bound of $mathcal{O}(frac{1}{sqrt{tlog N}})$. In contrast, regardless of how large $N$ is, the naive plug-in estimator for this problem only achieves the sub-optimal error of $Theta(frac{1}{sqrt{t}})$.
The purpose of this thesis is to develop new theories on high-dimensional structured signal recovery under a rather weak assumption on the measurements that only a finite number of moments exists. High-dimensional recovery has been one of the emerging topics in the last decade partly due to the celebrated work of Candes, Romberg and Tao (e.g. [CRT06, CRT04]). The original analysis there (and the works thereafter) necessitates a strong concentration argument (namely, the restricted isometry property), which only holds for a rather restricted class of measurements with light-tailed distributions. It had long been conjectured that high-dimensional recovery is possible even if restricted isometry type conditions do not hold, but the general theory was beyond the grasp until very recently, when the works [Men14a, KM15] propose a new small-ball method. In these two papers, the authors initiated a new analysis framework for general empirical risk minimization (ERM) problems with respect to the square loss, which is robust and can potentially allow heavy-tailed loss functions. The materials in this thesis are partly inspired by [Men14a], but are of a different mindset: rather than directly analyzing the existing ERMs for signal recovery for which it is difficult to avoid strong moment assumptions, we show that, in many circumstances, by carefully re-designing the ERMs to start with, one can still achieve the minimax optimal statistical rate of signal recovery with very high probability under much weaker assumptions than existing works.
We prove a Bernstein-von Mises theorem for a general class of high dimensional nonlinear Bayesian inverse problems in the vanishing noise limit. We propose a sufficient condition on the growth rate of the number of unknown parameters under which the posterior distribution is asymptotically normal. This growth condition is expressed explicitly in terms of the model dimension, the degree of ill-posedness of the inverse problem and the noise parameter. The theoretical results are applied to a Bayesian estimation of the medium parameter in an elliptic problem.
Empirical likelihood approach is one of non-parametric statistical methods, which is applied to the hypothesis testing or construction of confidence regions for pivotal unknown quantities. This method has been applied to the case of independent identically distributed random variables and second order stationary processes. In recent years, we observe heavy-tailed data in many fields. To model such data suitably, we consider symmetric scalar and multivariate $alpha$-stable linear processes generated by infinite variance innovation sequence. We use a Whittle likelihood type estimating function in the empirical likelihood ratio function and derive the asymptotic distribution of the empirical likelihood ratio statistic for $alpha$-stable linear processes. With the empirical likelihood statistic approach, the theory of estimation and testing for second order stationary processes is nicely extended to heavy-tailed data analyses, not straightforward, and applicable to a lot of financial statistical analyses.