ترغب بنشر مسار تعليمي؟ اضغط هنا

104 - Alain Celisse 2014
We analyze the performance of cross-validation (CV) in the density estimation framework with two purposes: (i) risk estimation and (ii) model selection. The main focus is given to the so-called leave-$p$-out CV procedure (Lpo), where $p$ denotes the cardinality of the test set. Closed-form expressions are settled for the Lpo estimator of the risk of projection estimators. These expressions provide a great improvement upon $V$-fold cross-validation in terms of variability and computational complexity. From a theoretical point of view, closed-form expressions also enable to study the Lpo performance in terms of risk estimation. The optimality of leave-one-out (Loo), that is Lpo with $p=1$, is proved among CV procedures used for risk estimation. Two model selection frameworks are also considered: estimation, as opposed to identification. For estimation with finite sample size $n$, optimality is achieved for $p$ large enough [with $p/n=o(1)$] to balance the overfitting resulting from the structure of the model collection. For identification, model selection consistency is settled for Lpo as long as $p/n$ is conveniently related to the rate of convergence of the best estimator in the collection: (i) $p/nto1$ as $nto+infty$ with a parametric rate, and (ii) $p/n=o(1)$ with some nonparametric estimators. These theoretical results are validated by simulation experiments.
This paper tackles the problem of detecting abrupt changes in the mean of a heteroscedastic signal by model selection, without knowledge on the variations of the noise. A new family of change-point detection procedures is proposed, showing that cross -validation methods can be successful in the heteroscedastic framework, whereas most existing procedures are not robust to heteroscedasticity. The robustness to heteroscedasticity of the proposed procedures is supported by an extensive simulation study, together with recent theoretical results. An application to Comparative Genomic Hybridization (CGH) data is provided, showing that robustness to heteroscedasticity can indeed be required for their analysis.
In the multiple testing context, a challenging problem is the estimation of the proportion $pi_0$ of true-null hypotheses. A large number of estimators of this quantity rely on identifiability assumptions that either appear to be violated on real dat a, or may be at least relaxed. Under independence, we propose an estimator $hat{pi}_0$ based on density estimation using both histograms and cross-validation. Due to the strong connection between the false discovery rate (FDR) and $pi_0$, many multiple testing procedures (MTP) designed to control the FDR may be improved by introducing an estimator of $pi_0$. We provide an example of such an improvement (plug-in MTP) based on the procedure of Benjamini and Hochberg. Asymptotic optimality results may be derived for both $hat{pi}_0$ and the resulting plug-in procedure. The latter ensures the desired asymptotic control of the FDR, while it is more powerful than the BH-procedure. Finally, we compare our estimator of $pi_0$ with other widespread estimators in a wide range of simulations. We obtain better results than other tested methods in terms of mean square error (MSE) of the proposed estimator. Finally, both asymptotic optimality results and the interest in tightly estimating $pi_0$ are confirmed (empirically) by results obtained with the plug-in MTP.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا