No Arabic abstract
We mainly study the M-estimation method for the high-dimensional linear regression model, and discuss the properties of M-estimator when the penalty term is the local linear approximation. In fact, M-estimation method is a framework, which covers the methods of the least absolute deviation, the quantile regression, least squares regression and Huber regression. We show that the proposed estimator possesses the good properties by applying certain assumptions. In the part of numerical simulation, we select the appropriate algorithm to show the good robustness of this method
For a multivariate linear model, Wilks likelihood ratio test (LRT) constitutes one of the cornerstone tools. However, the computation of its quantiles under the null or the alternative requires complex analytic approximations and more importantly, these distributional approximations are feasible only for moderate dimension of the dependent variable, say $ple 20$. On the other hand, assuming that the data dimension $p$ as well as the number $q$ of regression variables are fixed while the sample size $n$ grows, several asymptotic approximations are proposed in the literature for Wilks $bLa$ including the widely used chi-square approximation. In this paper, we consider necessary modifications to Wilks test in a high-dimensional context, specifically assuming a high data dimension $p$ and a large sample size $n$. Based on recent random matrix theory, the correction we propose to Wilks test is asymptotically Gaussian under the null and simulations demonstrate that the corrected LRT has very satisfactory size and power, surely in the large $p$ and large $n$ context, but also for moderately large data dimensions like $p=30$ or $p=50$. As a byproduct, we give a reason explaining why the standard chi-square approximation fails for high-dimensional data. We also introduce a new procedure for the classical multiple sample significance test in MANOVA which is valid for high-dimensional data.
We propose a new method for changepoint estimation in partially-observed, high-dimensional time series that undergo a simultaneous change in mean in a sparse subset of coordinates. Our first methodological contribution is to introduce a MissCUSUM transformation (a generalisation of the popular Cumulative Sum statistics), that captures the interaction between the signal strength and the level of missingness in each coordinate. In order to borrow strength across the coordinates, we propose to project these MissCUSUM statistics along a direction found as the solution to a penalised optimisation problem tailored to the specific sparsity structure. The changepoint can then be estimated as the location of the peak of the absolute value of the projected univariate series. In a model that allows different missingness probabilities in different component series, we identify that the key interaction between the missingness and the signal is a weighted sum of squares of the signal change in each coordinate, with weights given by the observation probabilities. More specifically, we prove that the angle between the estimated and oracle projection directions, as well as the changepoint location error, are controlled with high probability by the sum of two terms, both involving this weighted sum of squares, and representing the error incurred due to noise and the error due to missingness respectively. A lower bound confirms that our changepoint estimator, which we call MissInspect, is optimal up to a logarithmic factor. The striking effectiveness of the MissInspect methodology is further demonstrated both on simulated data, and on an oceanographic data set covering the Neogene period.
We obtain explicit error bounds for the $d$-dimensional normal approximation on hyperrectangles for a random vector that has a Stein kernel, or admits an exchangeable pair coupling, or is a non-linear statistic of independent random variables or a sum of $n$ locally dependent random vectors. We assume the approximating normal distribution has a non-singular covariance matrix. The error bounds vanish even when the dimension $d$ is much larger than the sample size $n$. We prove our main results using the approach of Gotze (1991) in Steins method, together with modifications of an estimate of Anderson, Hall and Titterington (1998) and a smoothing inequality of Bhattacharya and Rao (1976). For sums of $n$ independent and identically distributed isotropic random vectors having a log-concave density, we obtain an error bound that is optimal up to a $log n$ factor. We also discuss an application to multiple Wiener-It^{o} integrals.
Let ${X}_{k}=(x_{k1}, cdots, x_{kp}), k=1,cdots,n$, be a random sample of size $n$ coming from a $p$-dimensional population. For a fixed integer $mgeq 2$, consider a hypercubic random tensor $mathbf{{T}}$ of $m$-th order and rank $n$ with begin{eqnarray*} mathbf{{T}}= sum_{k=1}^{n}underbrace{{X}_{k}otimescdotsotimes {X}_{k}}_{m~multiple}=Big(sum_{k=1}^{n} x_{ki_{1}}x_{ki_{2}}cdots x_{ki_{m}}Big)_{1leq i_{1},cdots, i_{m}leq p}. end{eqnarray*} Let $W_n$ be the largest off-diagonal entry of $mathbf{{T}}$. We derive the asymptotic distribution of $W_n$ under a suitable normalization for two cases. They are the ultra-high dimension case with $ptoinfty$ and $log p=o(n^{beta})$ and the high-dimension case with $pto infty$ and $p=O(n^{alpha})$ where $alpha,beta>0$. The normalizing constant of $W_n$ depends on $m$ and the limiting distribution of $W_n$ is a Gumbel-type distribution involved with parameter $m$.
We study variance estimation and associated confidence intervals for parameters characterizing genetic effects from genome-wide association studies (GWAS) misspecified mixed model analysis. Previous studies have shown that, in spite of the model misspecification, certain quantities of genetic interests are estimable, and consistent estimators of these quantities can be obtained using the restricted maximum likelihood (REML) method under a misspecified linear mixed model. However, the asymptotic variance of such a REML estimator is complicated and not ready to be implemented for practical use. In this paper, we develop practical and computationally convenient methods for estimating such asymptotic variances and constructing the associated confidence intervals. Performance of the proposed methods is evaluated empirically based on Monte-Carlo simulations and real-data application.