The asymptotic variance of the maximum likelihood estimate is proved to decrease when the maximization is restricted to a subspace that contains the true parameter value. Maximum likelihood estimation allows a systematic fitting of covariance models to the sample, which is important in data assimilation. The hierarchical maximum likelihood approach is applied to the spectral diagonal covariance model with different parameterizations of eigenvalue decay, and to the sparse inverse covariance model with specified parameter values on different sets of nonzero entries. It is shown computationally that using smaller sets of parameters can decrease the sampling noise in high dimension substantially.
We present new results for consistency of maximum likelihood estimators with a focus on multivariate mixed models. Our theory builds on the idea of using subsets of the full data to establish consistency of estimators based on the full data. It requires neither that the data consist of independent observations, nor that the observations can be modeled as a stationary stochastic process. Compared to existing asymptotic theory using the idea of subsets we substantially weaken the assumptions, bringing them closer to what suffices in classical settings. We apply our theory in two multivariate mixed models for which it was unknown whether maximum likelihood estimators are consistent. The models we consider have non-stochastic predictors and multivariate responses which are possibly mixed-type (some discrete and some continuous).
Estimating the matrix of connections probabilities is one of the key questions when studying sparse networks. In this work, we consider networks generated under the sparse graphon model and the in-homogeneous random graph model with missing observations. Using the Stochastic Block Model as a parametric proxy, we bound the risk of the maximum likelihood estimator of network connections probabilities , and show that it is minimax optimal. When risk is measured in Frobenius norm, no estimator running in polynomial time has been shown to attain the minimax optimal rate of convergence for this problem. Thus, maximum likelihood estimation is of particular interest as computationally efficient approximations to it have been proposed in the literature and are often used in practice.
The saddlepoint approximation gives an approximation to the density of a random variable in terms of its moment generating function. When the underlying random variable is itself the sum of $n$ unobserved i.i.d. terms, the basic classical result is that the relative error in the density is of order $1/n$. If instead the approximation is interpreted as a likelihood and maximised as a function of model parameters, the result is an approximation to the maximum likelihood estimate (MLE) that can be much faster to compute than the true MLE. This paper proves the analogous basic result for the approximation error between the saddlepoint MLE and the true MLE: subject to certain explicit identifiability conditions, the error has asymptotic size $O(1/n^2)$ for some parameters, and $O(1/n^{3/2})$ or $O(1/n)$ for others. In all three cases, the approximation errors are asymptotically negligible compared to the inferential uncertainty. The proof is based on a factorisation of the saddlepoint likelihood into an exact and approximate term, along with an analysis of the approximation error in the gradient of the log-likelihood. This factorisation also gives insight into alternatives to the saddlepoint approximation, including a new and simpler saddlepoint approximation, for which we derive analogous error bounds. As a corollary of our results, we also obtain the asymptotic size of the MLE error approximation when the saddlepoint approximation is replaced by the normal approximation.
Let $mathbf{X}_n=(x_{ij})$ be a $k times n$ data matrix with complex-valued, independent and standardized entries satisfying a Lindeberg-type moment condition. We consider simultaneously $R$ sample covariance matrices $mathbf{B}_{nr}=frac1n mathbf{Q}_r mathbf{X}_n mathbf{X}_n^*mathbf{Q}_r^top,~1le rle R$, where the $mathbf{Q}_{r}$s are nonrandom real matrices with common dimensions $ptimes k~(kgeq p)$. Assuming that both the dimension $p$ and the sample size $n$ grow to infinity, the limiting distributions of the eigenvalues of the matrices ${mathbf{B}_{nr}}$ are identified, and as the main result of the paper, we establish a joint central limit theorem for linear spectral statistics of the $R$ matrices ${mathbf{B}_{nr}}$. Next, this new CLT is applied to the problem of testing a high dimensional white noise in time series modelling. In experiments the derived test has a controlled size and is significantly faster than the classical permutation test, though it does have lower power. This application highlights the necessity of such joint CLT in the presence of several dependent sample covariance matrices. In contrast, all the existing works on CLT for linear spectral statistics of large sample covariance matrices deal with a single sample covariance matrix ($R=1$).
We find limiting distributions of the nonparametric maximum likelihood estimator (MLE) of a log-concave density, that is, a density of the form $f_0=expvarphi_0$ where $varphi_0$ is a concave function on $mathbb{R}$. The pointwise limiting distributions depend on the second and third derivatives at 0 of $H_k$, the lower invelope of an integrated Brownian motion process minus a drift term depending on the number of vanishing derivatives of $varphi_0=log f_0$ at the point of interest. We also establish the limiting distribution of the resulting estimator of the mode $M(f_0)$ and establish a new local asymptotic minimax lower bound which shows the optimality of our mode estimator in terms of both rate of convergence and dependence of constants on population values.