Do you want to publish a course? Click here

On new data sources for the production of official statistics

80   0   0.0 ( 0 )
 Added by David Salgado
 Publication date 2020
and research's language is English




Ask ChatGPT about the research

In the past years we have witnessed the rise of new data sources for the potential production of official statistics, which, by and large, can be classified as survey, administrative, and digital data. Apart from the differences in their generation and collection, we claim that their lack of statistical metadata, their economic value, and their lack of ownership by data holders pose several entangled challenges lurking the incorporation of new data into the routinely production of official statistics. We argue that every challenge must be duly overcome in the international community to bring new statistical products based on these sources. These challenges can be naturally classified into different entangled issues regarding access to data, statistical methodology, quality, information technologies, and management. We identify the most relevant to be necessarily tackled before new data sources can be definitively considered fully incorporated into the production of official statistics.



rate research

Read More

The field of data science currently enjoys a broad definition that includes a wide array of activities which borrow from many other established fields of study. Having such a vague characterization of a field in the early stages might be natural, but over time maintaining such a broad definition becomes unwieldy and impedes progress. In particular, the teaching of data science is hampered by the seeming need to cover many different points of interest. Data scientists must ultimately identify the core of the field by determining what makes the field unique and what it means to develop new knowledge in data science. In this review we attempt to distill some core ideas from data science by focusing on the iterative process of data analysis and develop some generalizations from past experience. Generalizations of this nature could form the basis of a theory of data science and would serve to unify and scale the teaching of data science to large audiences.
116 - Jingjing Yang , Peng Ren 2016
We provide a MATLAB toolbox, BFDA, that implements a Bayesian hierarchical model to smooth multiple functional data with the assumptions of the same underlying Gaussian process distribution, a Gaussian process prior for the mean function, and an Inverse-Wishart process prior for the covariance function. This model-based approach can borrow strength from all functional data to increase the smoothing accuracy, as well as estimate the mean-covariance functions simultaneously. An option of approximating the Bayesian inference process using cubic B-spline basis functions is integrated in BFDA, which allows for efficiently dealing with high-dimensional functional data. Examples of using BFDA in various scenarios and conducting follow-up functional regression are provided. The advantages of BFDA include: (1) Simultaneously smooths multiple functional data and estimates the mean-covariance functions in a nonparametric way; (2) flexibly deals with sparse and high-dimensional functional data with stationary and nonstationary covariance functions, and without the requirement of common observation grids; (3) provides accurately smoothed functional data for follow-up analysis.
Mean profiles are widely used as indicators of the electricity consumption habits of customers. Currently, in Electricite De France (EDF), class load profiles are estimated using point-wise mean function. Unfortunately, it is well known that the mean is highly sensitive to the presence of outliers, such as one or more consumers with unusually high-levels of consumption. In this paper, we propose an alternative to the mean profile: the $L_1$-median profile which is more robust. When dealing with large datasets of functional data (load curves for example), survey sampling approaches are useful for estimating the median profile avoiding storing the whole data. We propose here estimators of the median trajectory using several sampling strategies and estimators. A comparison between them is illustrated by means of a test population. We develop a stratification based on the linearized variable which substantially improves the accuracy of the estimator compared to simple random sampling without replacement. We suggest also an improved estimator that takes into account auxiliary information. Some potential areas for future research are also highlighted.
260 - Wei Xu , Wen Chen , Yingjie Liang 2017
This study is to investigate the feasibility of least square method in fitting non-Gaussian noise data. We add different levels of the two typical non-Gaussian noises, Levy and stretched Gaussian noises, to exact value of the selected functions including linear equations, polynomial and exponential equations, and the maximum absolute and the mean square errors are calculated for the different cases. Levy and stretched Gaussian distributions have many applications in fractional and fractal calculus. It is observed that the non-Gaussian noises are less accurately fitted than the Gaussian noise, but the stretched Gaussian cases appear to perform better than the Levy noise cases. It is stressed that the least-squares method is inapplicable to the non-Gaussian noise cases when the noise level is larger than 5%.
A freely available educational application (a mobile website) is presented. This provides access to educational material and drilling on selected topics within mathematics and statistics with an emphasis on tablets and mobile phones. The application adapts to the students performance, selecting from easy to difficult questions, or older material etc. These adaptations are based on statistical models and analyses of data from testing precursors of the system within several courses, from calculus and introductory statistics through multiple linear regression. The application can be used in both on-line and off-line modes. The behavior of the application is determined by parameters, the effects of which can be estimated statistically. Results presented include analyses of how the internal algorithms relate to passing a course and general incremental improvement in knowledge during a semester.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا