No Arabic abstract
Multilevel or hierarchical data structures can occur in many areas of research, including economics, psychology, sociology, agriculture, medicine, and public health. Over the last 25 years, there has been increasing interest in developing suitable techniques for the statistical analysis of multilevel data, and this has resulted in a broad class of models known under the generic name of multilevel models. Generally, multilevel models are useful for exploring how relationships vary across higher-level units taking into account the within and between cluster variations. Research scientists often have substantive theories in mind when evaluating data with statistical models. Substantive theories often involve inequality constraints among the parameters to translate a theory into a model. This chapter shows how the inequality constrained multilevel linear model can be given a Bayesian formulation, how the model parameters can be estimated using a so-called augmented Gibbs sampler, and how posterior probabilities can be computed to assist the researcher in model selection.
Large renewable energy projects, such as large offshore wind farms, are critical to achieving low-emission targets set by governments. Stochastic computer models allow us to explore future scenarios to aid decision making whilst considering the most relevant uncertainties. Complex stochastic computer models can be prohibitively slow and thus an emulator may be constructed and deployed to allow for efficient computation. We present a novel heteroscedastic Gaussian Process emulator which exploits cheap approximations to a stochastic offshore wind farm simulator. We conduct a probabilistic sensitivity analysis to understand the influence of key parameters in the wind farm simulator which will help us to plan a probability elicitation in the future.
During the last decades, many methods for the analysis of functional data including classification methods have been developed. Nonetheless, there are issues that have not been adressed satisfactorily by currently available methods, as, for example, feature selection combined with variable selection when using multiple functional covariates. In this paper, a functional ensemble is combined with a penalized and constrained multinomial logit model. It is shown that this synthesis yields a powerful classification tool for functional data (possibly mixed with non-functional predictors), which also provides automatic variable selection. The choice of an appropriate, sparsity-inducing penalty allows to estimate most model coefficients to exactly zero, and permits class-specific coefficients in multiclass problems, such that feature selection is obtained. An additional constraint within the multinomial logit model ensures that the model coefficients can be considered as weights. Thus, the estimation results become interpretable with respect to the discriminative importance of the selected features, which is rated by a feature importance measure. In two application examples, data of a cell chip used for water quality monitoring experiments and phoneme data used for speech recognition, the interpretability as well as the selection results are examined. The classification performance is compared to various other classification approaches which are in common use.
The underlying idea behind the construction of indices of economic inequality is based on measuring deviations of various portions of low incomes from certain references or benchmarks, that could be point measures like population mean or median, or curves like the hypotenuse of the right triangle where every Lorenz curve falls into. In this paper we argue that by appropriately choosing population-based references, called societal references, and distributions of personal positions, called gambles, which are random, we can meaningfully unify classical and contemporary indices of economic inequality, as well as various measures of risk. To illustrate the herein proposed approach, we put forward and explore a risk measure that takes into account the relativity of large risks with respect to small ones.
Heywood cases are known from linear factor analysis literature as variables with communalities larger than 1.00, and in present day factor models, the problem also shows in negative residual variances. For binary data, ordinal factor models can be applied with either delta parameterization or theta parametrization. The former is more common than the latter and can yield Heywood cases when limited information estimation is used. The same problem shows up as nonconvergence cases in theta parameterized factor models and as extremely large discriminations in item response theory (IRT) models. In this study, we explain why the same problem appears in different forms depending on the method of analysis. We first discuss this issue using equations and then illustrate our conclusions using a small simulation study, where all three methods, delta and theta parameterized ordinal factor models (with estimation based on polychoric correlations) and an IRT model (with full information estimation), are used to analyze the same datasets. We also compared the performances of the WLS, WLSMV, and ULS estimators for the ordinal factor models. Finally, we analyze real data with the same three approaches. The results of the simulation study and the analysis of real data confirm the theoretical conclusions.
The vast majority of models for the spread of communicable diseases are parametric in nature and involve underlying assumptions about how the disease spreads through a population. In this article we consider the use of Bayesian nonparametric approaches to analysing data from disease outbreaks. Specifically we focus on methods for estimating the infection process in simple models under the assumption that this process has an explicit time-dependence.