Statistical Inference of Minimally Complex Models

120 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Matteo Marsili

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Clelia de Mulatier - Paolo P. Mazza - Matteo Marsili

الذكاء الاصطناعي نظرية الإحصاء تحليل البيانات والإحصاءات والاحتمال

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Finding the best model that describes a high dimensional dataset, is a daunting task. For binary data, we show that this becomes feasible, if the search is restricted to simple models. These models -- that we call Minimally Complex Models (MCMs) -- are simple because they are composed of independent components of minimal complexity, in terms of description length. Simple models are easy to infer and to sample from. In addition, model selection within the MCMs class is invariant with respect to changes in the representation of the data. They portray the structure of dependencies among variables in a simple way. They provide robust predictions on dependencies and symmetries, as illustrated in several examples. MCMs may contain interactions between variables of any order. So, for example, our approach reveals whether a dataset is appropriately described by a pairwise interaction model.

قيم البحث

546 - Jan Noessner , Mathias Niepert , Heiner Stuckenschmidt 2013

RockIt is a maximum a-posteriori (MAP) query engine for statistical relational models. MAP inference in graphical models is an optimization problem which can be compiled to integer linear programs (ILPs). We describe several advances in translating M AP queries to ILP instances and present the novel meta-algorithm cutting plane aggregation (CPA). CPA exploits local context-specific symmetries and bundles up sets of linear constraints. The resulting counting constraints lead to more compact ILPs and make the symmetry of the ground model more explicit to state-of-the-art ILP solvers. Moreover, RockIt parallelizes most parts of the MAP inference pipeline taking advantage of ubiquitous shared-memory multi-core architectures. We report on extensive experiments with Markov logic network (MLN) benchmarks showing that RockIt outperforms the state-of-the-art systems Alchemy, Markov TheBeast, and Tuffy both in terms of efficiency and quality of results.

الذكاء الاصطناعي

Tractable Minor-free Generalization of Planar Zero-field Ising Models

76 - Valerii Likhosherstov , Yury Maximov , Michael Chertkov 2019

We present a new family of zero-field Ising models over $N$ binary variables/spins obtained by consecutive gluing of planar and $O(1)$-sized components and subsets of at most three vertices into a tree. The polynomial-time algorithm of the dynamic pr ogramming type for solving exact inference (computing partition function) and exact sampling (generating i.i.d. samples) consists in a sequential application of an efficient (for planar) or brute-force (for $O(1)$-sized) inference and sampling to the components as a black box. To illustrate the utility of the new family of tractable graphical models, we first build a polynomial algorithm for inference and sampling of zero-field Ising models over $K_{3,3}$-minor-free topologies and over $K_{5}$-minor-free topologies -- both are extensions of the planar zero-field Ising models -- which are neither genus - nor treewidth-bounded. Second, we demonstrate empirically an improvement in the approximation quality of the NP-hard problem of inference over the square-grid Ising model in a node-dependent non-zero magnetic field.

بنى وهياكل البيانات والخوارزميات نظرية الإحصاء تحليل البيانات والإحصاءات والاحتمال

Proximal Causal Inference for Complex Longitudinal Studies

196 - Andrew Ying , Wang Miao , Xu Shi 2021

A standard assumption for causal inference about the joint effects of time-varying treatment is that one has measured sufficient covariates to ensure that within covariate strata, subjects are exchangeable across observed treatment values, also known as sequential randomization assumption (SRA). SRA is often criticized as it requires one to accurately measure all confounders. Realistically, measured covariates can rarely capture all confounders with certainty. Often covariate measurements are at best proxies of confounders, thus invalidating inferences under SRA. In this paper, we extend the proximal causal inference (PCI) framework of Miao et al. (2018) to the longitudinal setting under a semiparametric marginal structural mean model (MSMM). PCI offers an opportunity to learn about joint causal effects in settings where SRA based on measured time-varying covariates fails, by formally accounting for the covariate measurements as imperfect proxies of underlying confounding mechanisms. We establish nonparametric identification with a pair of time-varying proxies and provide a corresponding characterization of regular and asymptotically linear estimators of the parameter indexing the MSMM, including a rich class of doubly robust estimators, and establish the corresponding semiparametric efficiency bound for the MSMM. Extensive simulation studies and a data application illustrate the finite sample behavior of proposed methods.

المنهجية نظرية الإحصاء نظرية الإحصاء

Statistical inference for the EU portfolio in high dimensions

159 - Taras Bodnar , Solomiia Dmytriv , Yarema Okhrin 2020

In this paper, using the shrinkage-based approach for portfolio weights and modern results from random matrix theory we construct an effective procedure for testing the efficiency of the expected utility (EU) portfolio and discuss the asymptotic beha vior of the proposed test statistic under the high-dimensional asymptotic regime, namely when the number of assets $p$ increases at the same rate as the sample size $n$ such that their ratio $p/n$ approaches a positive constant $cin(0,1)$ as $ntoinfty$. We provide an extensive simulation study where the power function and receiver operating characteristic curves of the test are analyzed. In the empirical study, the methodology is applied to the returns of S&P 500 constituents.

إدارة المحافظ نظرية الإحصاء التمويل الإحصائي

Collaborative Nested Sampling: Big Data vs. complex physical models

68 - Johannes Buchner 2017

The data torrent unleashed by current and upcoming astronomical surveys demands scalable analysis methods. Many machine learning approaches scale well, but separating the instrument measurement from the physical effects of interest, dealing with vari able errors, and deriving parameter uncertainties is often an after-thought. Classic forward-folding analyses with Markov Chain Monte Carlo or Nested Sampling enable parameter estimation and model comparison, even for complex and slow-to-evaluate physical models. However, these approaches require independent runs for each data set, implying an unfeasible number of model evaluations in the Big Data regime. Here I present a new algorithm, collaborative nested sampling, for deriving parameter probability distributions for each observation. Importantly, the number of physical model evaluations scales sub-linearly with the number of data sets, and no assumptions about homogeneous errors, Gaussianity, the form of the model or heterogeneity/completeness of the observations need to be made. Collaborative nested sampling has immediate application in speeding up analyses of large surveys, integral-field-unit observations, and Monte Carlo simulations.

حساب الأجهزة والأساليب للزيئات الفيزياء الفلكية تحليل البيانات والإحصاءات والاحتمال