ترغب بنشر مسار تعليمي؟ اضغط هنا

68 - Jian Ma , Zengqi Sun 2019
Dependence strucuture estimation is one of the important problems in machine learning domain and has many applications in different scientific areas. In this paper, a theoretical framework for such estimation based on copula and copula entropy -- the probabilistic theory of representation and measurement of statistical dependence, is proposed. Graphical models are considered as a special case of the copula framework. A method of the framework for estimating maximum spanning copula is proposed. Due to copula, the method is irrelevant to the properties of individual variables, insensitive to outlier and able to deal with non-Gaussianity. Experiments on both simulated data and real dataset demonstrated the effectiveness of the proposed method.
The use of quantiles to obtain insights about multivariate data is addressed. It is argued that incisive insights can be obtained by considering directional quantiles, the quantiles of projections. Directional quantile envelopes are proposed as a way to condense this kind of information; it is demonstrated that they are essentially halfspace (Tukey) depth levels sets, coinciding for elliptic distributions (in particular multivariate normal) with density contours. Relevant questions concerning their indexing, the possibility of the reverse retrieval of directional quantile information, invariance with respect to affine transformations, and approximation/asymptotic properties are studied. It is argued that the analysis in terms of directional quantiles and their envelopes offers a straightforward probabilistic interpretation and thus conveys a concrete quantitative meaning; the directional definition can be adapted to elaborate frameworks, like estimation of extreme quantiles and directional quantile regression, the regression of depth contours on covariates. The latter facilitates the construction of multivariate growth charts---the question that motivated all the development.
66 - Zhishen Ye , Jie Yang 2013
We propose a new method for dimension reduction in regression using the first two inverse moments. We develop corresponding weighted chi-squared tests for the dimension of the regression. The proposed method considers linear combinations of Sliced In verse Regression (SIR) and the method using a new candidate matrix which is designed to recover the entire inverse second moment subspace. The optimal combination may be selected based on the p-values derived from the dimension tests. Theoretically, the proposed method, as well as Sliced Average Variance Estimate (SAVE), are more capable of recovering the complete central dimension reduction subspace than SIR and Principle Hessian Directions (pHd). Therefore it can substitute for SIR, pHd, SAVE, or any linear combination of them at a theoretical level. Simulation study indicates that the proposed method may have consistently greater power than SIR, pHd, and SAVE.
Approximate Bayesian computation (ABC) or likelihood-free inference algorithms are used to find approximations to posterior distributions without making explicit use of the likelihood function, depending instead on simulation of sample data sets from the model. In this paper we show that under the assumption of the existence of a uniform additive model error term, ABC algorithms give exact results when sufficient summaries are used. This interpretation allows the approximation made in many previous application papers to be understood, and should guide the choice of metric and tolerance in future work. ABC algorithms can be generalized by replacing the 0-1 cut-off with an acceptance probability that varies with the distance of the simulated data from the observed data. The acceptance density gives the distribution of the error term, enabling the uniform error usually used to be replaced by a general distribution. This generalization can also be applied to approximate Markov chain Monte Carlo algorithms. In light of this work, ABC algorithms can be seen as calibration techniques for implicit stochastic models, inferring parameter values in light of the computer model, data, prior beliefs about the parameter values, and any measurement or model errors.
85 - Z. Bai , D. Jiang , J. Yao 2012
For a multivariate linear model, Wilks likelihood ratio test (LRT) constitutes one of the cornerstone tools. However, the computation of its quantiles under the null or the alternative requires complex analytic approximations and more importantly, th ese distributional approximations are feasible only for moderate dimension of the dependent variable, say $ple 20$. On the other hand, assuming that the data dimension $p$ as well as the number $q$ of regression variables are fixed while the sample size $n$ grows, several asymptotic approximations are proposed in the literature for Wilks $bLa$ including the widely used chi-square approximation. In this paper, we consider necessary modifications to Wilks test in a high-dimensional context, specifically assuming a high data dimension $p$ and a large sample size $n$. Based on recent random matrix theory, the correction we propose to Wilks test is asymptotically Gaussian under the null and simulations demonstrate that the corrected LRT has very satisfactory size and power, surely in the large $p$ and large $n$ context, but also for moderately large data dimensions like $p=30$ or $p=50$. As a byproduct, we give a reason explaining why the standard chi-square approximation fails for high-dimensional data. We also introduce a new procedure for the classical multiple sample significance test in MANOVA which is valid for high-dimensional data.
148 - K. L. Mengersen 2012
Approximate Bayesian computation (ABC) has become an essential tool for the analysis of complex stochastic models when the likelihood function is numerically unavailable. However, the well-established statistical method of empirical likelihood provid es another route to such settings that bypasses simulations from the model and the choices of the ABC parameters (summary statistics, distance, tolerance), while being convergent in the number of observations. Furthermore, bypassing model simulations may lead to significant time savings in complex models, for instance those found in population genetics. The BCel algorithm we develop in this paper also provides an evaluation of its own performance through an associated effective sample size. The method is illustrated using several examples, including estimation of standard distributions, time series, and population genetics models.
Consider a two-by-two factorial experiment with more than 1 replicate. Suppose that we have uncertain prior information that the two-factor interaction is zero. We describe new simultaneous frequentist confidence intervals for the 4 population cell m eans, with simultaneous confidence coefficient 1-alpha, that utilize this prior information in the following sense. These simultaneous confidence intervals define a cube with expected volume that (a) is relatively small when the two-factor interaction is zero and (b) has maximum value that is not too large. Also, these intervals coincide with the standard simultaneous confidence intervals obtained by Tukeys method, with simultaneous confidence coefficient 1-alpha, when the data strongly contradict the prior information that the two-factor interaction is zero. We illustrate the application of these new simultaneous confidence intervals to a real data set.
150 - Umberto Picchini 2012
Models defined by stochastic differential equations (SDEs) allow for the representation of random variability in dynamical systems. The relevance of this class of models is growing in many applied research areas and is already a standard tool to mode l e.g. financial, neuronal and population growth dynamics. However inference for multidimensional SDE models is still very challenging, both computationally and theoretically. Approximate Bayesian computation (ABC) allow to perform Bayesian inference for models which are sufficiently complex that the likelihood function is either analytically unavailable or computationally prohibitive to evaluate. A computationally efficient ABC-MCMC algorithm is proposed, halving the running time in our simulations. Focus is on the case where the SDE describes latent dynamics in state-space models; however the methodology is not limited to the state-space framework. Simulation studies for a pharmacokinetics/pharmacodynamics model and for stochastic chemical reactions are considered and a MATLAB package implementing our ABC-MCMC algorithm is provided.
Cooks distance [Technometrics 19 (1977) 15-18] is one of the most important diagnostic tools for detecting influential individual or subsets of observations in linear regression for cross-sectional data. However, for many complex data structures (e.g ., longitudinal data), no rigorous approach has been developed to address a fundamental issue: deleting subsets with different numbers of observations introduces different degrees of perturbation to the current model fitted to the data, and the magnitude of Cooks distance is associated with the degree of the perturbation. The aim of this paper is to address this issue in general parametric models with complex data structures. We propose a new quantity for measuring the degree of the perturbation introduced by deleting a subset. We use stochastic ordering to quantify the stochastic relationship between the degree of the perturbation and the magnitude of Cooks distance. We develop several scaled Cooks distances to resolve the comparison of Cooks distance for different subset deletions. Theoretical and numerical examples are examined to highlight the broad spectrum of applications of these scaled Cooks distances in a formal influence analysis.
This paper aims to enhance our understanding of substantive questions regarding self-reported happiness and well-being through the specification and use of multi-level models. To date, there have been numerous quantitative research studies of the hap piness of individuals, based on single-level regression models, where typically a happiness index is related to a set of explanatory variables. There are also several single-level studies comparing aggregate happiness levels between countries. Nevertheless, there have been very few studies that attempt to simultaneously take into account variations in happiness and well-being at several different levels, such as individual, household, and area. Here, multilevel models are used with data from the British Household Panel Survey to assess the nature and extent of variations in happiness and well-being to determine the relative importance of the area (district, region), household and individual characteristics on these outcomes. Moreover, having taken into account the characteristics at these different levels in the multilevel models, the paper shows how it is possible to identify any areas that are associated with especially positive or negative feelings of happiness and well-being.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا