ﻻ يوجد ملخص باللغة العربية
In survey sampling, calibration is a very popular tool used to make total estimators consistent with known totals of auxiliary variables and to reduce variance. When the number of auxiliary variables is large, calibration on all the variables may lead to estimators of totals whose mean squared error (MSE) is larger than the MSE of the Horvitz-Thompson estimator even if this simple estimator does not take account of the available auxiliary information. We study in this paper a new technique based on dimension reduction through principal components that can be useful in this large dimension context. Calibration is performed on the first principal components, which can be viewed as the synthetic variables containing the most important part of the variability of the auxiliary variables. When some auxiliary variables play a more important role than the others, the method can be adapted to provide an exact calibration on these important variables. Some asymptotic properties are given in which the number of variables is allowed to tend to infinity with the population size. A data driven selection criterion of the number of principal components ensuring that all the sampling weights remain positive is discussed. The methodology of the paper is illustrated, in a multipurpose context, by an application to the estimation of electricity consumption for each day of a week with the help of 336 auxiliary variables consisting of the past consumption measured every half an hour over the previous week.
In causal inference, principal stratification is a framework for dealing with a posttreatment intermediate variable between a treatment and an outcome, in which the principal strata are defined by the joint potential values of the intermediate variab
We present a new functional Bayes classifier that uses principal component (PC) or partial least squares (PLS) scores from the common covariance function, that is, the covariance function marginalized over groups. When the groups have different covar
In many machine learning applications, it is important for the model to provide confidence scores that accurately captures its prediction uncertainty. Although modern learning methods have achieved great success in predictive accuracy, generating cal
We study least squares linear regression over $N$ uncorrelated Gaussian features that are selected in order of decreasing variance. When the number of selected features $p$ is at most the sample size $n$, the estimator under consideration coincides w
In this paper, we extend graph-based identification methods by allowing background knowledge in the form of non-zero parameter values. Such information could be obtained, for example, from a previously conducted randomized experiment, from substantiv