Choosing among notions of multivariate depth statistics

58 0 0.0 ( 0 )

Download Cite

Added by Pavlo Mozharovskyi

Publication date 2020

fields Mathematical Statistics

and research's language is English

Authors Karl Mosler - Pavlo Mozharovskyi

Methodology

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Classical multivariate statistics measures the outlyingness of a point by its Mahalanobis distance from the mean, which is based on the mean and the covariance matrix of the data. A multivariate depth function is a function which, given a point and a distribution in d-space, measures centrality by a number between 0 and 1, while satisfying certain postulates regarding invariance, monotonicity, convexity and continuity. Accordingly, numerous notions of multivariate depth have been proposed in the literature, some of which are also robust against extremely outlying data. The departure from classical Mahalanobis distance does not come without cost. There is a trade-off between invariance, robustness and computational feasibility. In the last few years, efficient exact algorithms as well as approximate ones have been constructed and made available in R-packages. Consequently, in practical applications the choice of a depth statistic is no more restricted to one or two notions due to computational limits; rather often more notions are feasible, among which the researcher has to decide. The article debates theoretical and practical aspects of this choice, including invariance and uniqueness, robustness and computational feasibility. Complexity and speed of exact algorithms are compared. The accuracy of approximate approaches like the random Tukey depth is discussed as well as the application to large and high-dimensional data. Extensions to local and functional depths and connections to regression depth are shortly addressed.

rate research

General notions of depth for functional data

145 - Karl Mosler , Yulia Polyakova 2012

A data depth measures the centrality of a point with respect to an empirical distribution. Postulates are formulated, which a depth for functional data should satisfy, and a general approach is proposed to construct multivariate data depths in Banach spaces. The new approach, mentioned as Phi-depth, is based on depth infima over a proper set Phi of R^d-valued linear functions. Several desirable properties are established for the Phi-depth and a generalized version of it. The general notions include many new depths as special cases. In particular a location-slope depth and a principal component depth are introduced.

Methodology

Depth statistics

250 - Karl Mosler 2012

In 1975 John Tukey proposed a multivariate median which is the deepest point in a given data cloud in R^d. Later, in measuring the depth of an arbitrary point z with respect to the data, David Donoho and Miriam Gasko considered hyperplanes through z and determined its depth by the smallest portion of data that are separated by such a hyperplane. Since then, these ideas has proved extremely fruitful. A rich statistical methodology has developed that is based on data depth and, more general, nonparametric depth statistics. General notions of data depth have been introduced as well as many special ones. These notions vary regarding their computability and robustness and their sensitivity to reflect asymmetric shapes of the data. According to their different properties they fit to particular applications. The upper level sets of a depth statistic provide a family of set-valued statistics, named depth-trimmed or central regions. They describe the distribution regarding its location, scale and shape. The most central region serves as a median. The notion of depth has been extended from data clouds, that is empirical distributions, to general probability distributions on R^d, thus allowing for laws of large numbers and consistency results. It has also been extended from d-variate data to data in functional spaces.

Statistics Theory Statistics Theory

Family of mean-mixtures of multivariate normal distributions: properties, inference and assessment of multivariate skewness

154 - Meraj Abdi , Mohsen Madadi , N. Balakrishnan 2020

In this paper, a new mixture family of multivariate normal distributions, formed by mixing multivariate normal distribution and skewed distribution, is constructed. Some properties of this family, such as characteristic function, moment generating function, and the first four moments are derived. The distributions of affine transformations and canonical forms of the model are also derived. An EM type algorithm is developed for the maximum likelihood estimation of model parameters. We have considered in detail, some special cases of the family, using standard gamma and standard exponential mixture distributions, denoted by MMNG and MMNE, respectively. For the proposed family of distributions, different multivariate measures of skewness are computed. In order to examine the performance of the developed estimation method, some simulation studies are carried out to show that the maximum likelihood estimates based on the EM type algorithm do provide good performance. For different choices of parameters of MMNE distribution, several multivariate measures of skewness are computed and compared. Because some measures of skewness are scalar and some are vectors, in order to evaluate them properly, we have carried out a simulation study to determine the power of tests, based on samp

Methodology Statistics Theory Statistics Theory

A different approach for choosing a threshold in peaks over threshold

111 - Andrehette Verster Departmentn of Mathematical Statistics , Actuarial Science 2020

Abstract In Extreme Value methodology the choice of threshold plays an important role in efficient modelling of observations exceeding the threshold. The threshold must be chosen high enough to ensure an unbiased extreme value index but choosing the threshold too high results in uncontrolled variances. This paper investigates a generalized model that can assist in the choice of optimal threshold values in the gamma positive domain. A Bayesian approach is considered by deriving a posterior distribution for the unknown generalized parameter. Using the properties of the posterior distribution allows for a method to choose an optimal threshold without visual inspection.

Methodology Other Statistics

Multivariate Conditional Transformation Models

119 - Nadja Klein , Torsten Hothorn , Luisa Barbanti 2019

Regression models describing the joint distribution of multivariate response variables conditional on covariate information have become an important aspect of contemporary regression analysis. However, a limitation of such models is that they often rely on rather simplistic assumptions, e.g. a constant dependency structure that is not allowed to vary with the covariates or the restriction to linear dependence between the responses only. We propose a general framework for multivariate conditional transformation models that overcomes these limitations and describes the entire distribution in a tractable and interpretable yet flexible way conditional on nonlinear effects of covariates. The framework can be embedded into likelihood-based inference, including results on asymptotic normality, and allows the dependence structure to vary with covariates. In addition, the framework scales well beyond bivariate response situations, which were the main focus of most earlier investigations. We illustrate the application of multivariate conditional transformation models in a trivariate analysis of childhood undernutrition and demonstrate empirically that our approach can be beneficial compared to existing benchmarks such that complex truly multivariate data-generating processes can be inferred from observations.

Methodology