أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Camelia Goga

Improving the estimation of the odds-ratio using auxiliary information

178 - C. Goga , A Ruiz-Gazen 2014

The odds ratio measure is used in health and social surveys where the odds of a certain event is to be compared between two populations. It is defined using logistic regression, and requires that data from surveys are accompanied by their weights. A nonparametric estimation method that incorporates survey weights and auxiliary information may improve the precision of the odds ratio estimator. It consists in $B$-spline calibration which can handle the nonlinear structure of the parameter. The variance is estimated through linearization. Implementation is possible through standard survey softwares. The gain in precision depends on the data as shown on two examples.

المنهجية

Calibration and partial calibration on principal components when the number of auxiliary variables is large

134 - H. Cardot , C. Goga , M.-A Shehzad 2014

In survey sampling, calibration is a very popular tool used to make total estimators consistent with known totals of auxiliary variables and to reduce variance. When the number of auxiliary variables is large, calibration on all the variables may lea d to estimators of totals whose mean squared error (MSE) is larger than the MSE of the Horvitz-Thompson estimator even if this simple estimator does not take account of the available auxiliary information. We study in this paper a new technique based on dimension reduction through principal components that can be useful in this large dimension context. Calibration is performed on the first principal components, which can be viewed as the synthetic variables containing the most important part of the variability of the auxiliary variables. When some auxiliary variables play a more important role than the others, the method can be adapted to provide an exact calibration on these important variables. Some asymptotic properties are given in which the number of variables is allowed to tend to infinity with the population size. A data driven selection criterion of the number of principal components ensuring that all the sampling weights remain positive is discussed. The methodology of the paper is illustrated, in a multipurpose context, by an application to the estimation of electricity consumption for each day of a week with the help of 336 auxiliary variables consisting of the past consumption measured every half an hour over the previous week.

المنهجية

Efficient Estimation of Nonlinear Finite Population Parameters Using Nonparametrics

189 - Camelia Goga , Anne Ruiz-Gazen 2012

Currently, the high-precision estimation of nonlinear parameters such as Gini indices, low-income proportions or other measures of inequality is particularly crucial. In the present paper, we propose a general class of estimators for such parameters that take into account univariate auxiliary information assumed to be known for every unit in the population. Through a nonparametric model-assisted approach, we construct a unique system of survey weights that can be used to estimate any nonlinear parameter associated with any study variable of the survey, using a plug-in principle. Based on a rigorous functional approach and a linearization principle, the asymptotic variance of the proposed estimators is derived, and variance estimators are shown to be consistent under mild assumptions. The theory is fully detailed for penalized B-spline estimators together with suggestions for practical implementation and guidelines for choosing the smoothing parameters. The validity of the method is demonstrated on data extracted from the French Labor Force Survey. Point and confidence intervals estimation for the Gini index and the low-income proportion are derived. Theoretical and empirical results highlight our interest in using a nonparametric approach versus a parametric one when estimating nonlinear parameters in the presence of auxiliary information.

المنهجية تطبيقات الإحصاء

Using complex surveys to estimate the $L_1$-median of a functional variable: application to electricity load curves

286 - Mohamed Chaouch , Camelia Goga 2012

Mean profiles are widely used as indicators of the electricity consumption habits of customers. Currently, in Electricite De France (EDF), class load profiles are estimated using point-wise mean function. Unfortunately, it is well known that the mean is highly sensitive to the presence of outliers, such as one or more consumers with unusually high-levels of consumption. In this paper, we propose an alternative to the mean profile: the $L_1$-median profile which is more robust. When dealing with large datasets of functional data (load curves for example), survey sampling approaches are useful for estimating the median profile avoiding storing the whole data. We propose here estimators of the median trajectory using several sampling strategies and estimators. A comparison between them is illustrated by means of a test population. We develop a stratification based on the linearized variable which substantially improves the accuracy of the estimator compared to simple random sampling without replacement. We suggest also an improved estimator that takes into account auxiliary information. Some potential areas for future research are also highlighted.

إحصاء تطبيقات الإحصاء المنهجية

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد