No Arabic abstract
Kernel density estimation is a well known method involving a smoothing parameter (the bandwidth) that needs to be tuned by the user. Although this method has been widely used the bandwidth selection remains a challenging issue in terms of balancing algorithmic performance and statistical relevance. The purpose of this paper is to compare a recently developped bandwidth selection method for kernel density estimation to those which are commonly used by now (at least those which are implemented in the R-package). This new method is called Penalized Comparison to Overfitting (PCO). It has been proposed by some of the authors of this paper in a previous work devoted to its statistical relevance from a purely theoretical perspective. It is compared here to other usual bandwidth selection methods for univariate and also multivariate kernel density estimation on the basis of intensive simulation studies. In particular, cross-validation and plug-in criteria are numerically investigated and compared to PCO. The take home message is that PCO can outperform the classical methods without algorithmic additionnal cost.
In the multivariate regression, also referred to as multi-task learning in machine learning, the goal is to recover a vector-valued function based on noisy observations. The vector-valued function is often assumed to be of low rank. Although the multivariate linear regression is extensively studied in the literature, a theoretical study on the multivariate nonlinear regression is lacking. In this paper, we study reduced rank multivariate kernel ridge regression, proposed by cite{mukherjee2011reduced}. We prove the consistency of the function predictor and provide the convergence rate. An algorithm based on nuclear norm relaxation is proposed. A few numerical examples are presented to show the smaller mean squared prediction error comparing with the elementwise univariate kernel ridge regression.
From an optimizers perspective, achieving the global optimum for a general nonconvex problem is often provably NP-hard using the classical worst-case analysis. In the case of Coxs proportional hazards model, by taking its statistical model structures into account, we identify local strong convexity near the global optimum, motivated by which we propose to use two convex programs to optimize the folded-concave penalized Coxs proportional hazards regression. Theoretically, we investigate the statistical and computational tradeoffs of the proposed algorithm and establish the strong oracle property of the resulting estimators. Numerical studies and real data analysis lend further support to our algorithm and theory.
Statistical methods with empirical likelihood (EL) are appealing and effective especially in conjunction with estimating equations through which useful data information can be adaptively and flexibly incorporated. It is also known in the literature that EL approaches encounter difficulties when dealing with problems having high-dimensional model parameters and estimating equations. To overcome the challenges, we begin our study with a careful investigation on high-dimensional EL from a new scope targeting at estimating a high-dimensional sparse model parameters. We show that the new scope provides an opportunity for relaxing the stringent requirement on the dimensionality of the model parameter. Motivated by the new scope, we then propose a new penalized EL by applying two penalty functions respectively regularizing the model parameters and the associated Lagrange multipliers in the optimizations of EL. By penalizing the Lagrange multiplier to encourage its sparsity, we show that drastic dimension reduction in the number of estimating equations can be effectively achieved without compromising the validity and consistency of the resulting estimators. Most attractively, such a reduction in dimensionality of estimating equations is actually equivalent to a selection among those high-dimensional estimating equations, resulting in a highly parsimonious and effective device for high-dimensional sparse model parameters. Allowing both the dimensionalities of model parameters and estimating equations growing exponentially with the sample size, our theory demonstrates that the estimator from our new penalized EL is sparse and consistent with asymptotically normally distributed nonzero components. Numerical simulations and a real data analysis show that the proposed penalized EL works promisingly.
Starting with the Fourier integral theorem, we present natural Monte Carlo estimators of multivariate functions including densities, mixing densities, transition densities, regression functions, and the search for modes of multivariate density functions (modal regression). Rates of convergence are established and, in many cases, provide superior rates to current standard estimators such as those based on kernels, including kernel density estimators and kernel regression functions. Numerical illustrations are presented.
Selection of important covariates and to drop the unimportant ones from a high-dimensional regression model is a long standing problem and hence have received lots of attention in the last two decades. After selecting the correct model, it is also important to properly estimate the existing parameters corresponding to important covariates. In this spirit, Fan and Li (2001) proposed Oracle property as a desired feature of a variable selection method. Oracle property has two parts; one is the variable selection consistency (VSC) and the other one is the asymptotic normality. Keeping VSC fixed and making the other part stronger, Fan and Lv (2008) introduced the strong oracle property. In this paper, we consider different penalized regression techniques which are VSC and classify those based on oracle and strong oracle property. We show that both the residual and the perturbation bootstrap methods are second order correct for any penalized estimator irrespective of its class. Most interesting of all is the Lasso, introduced by Tibshirani (1996). Although Lasso is VSC, it is not asymptotically normal and hence fails to satisfy the oracle property.