No Arabic abstract
The number of $n$-gaussoids is shown to be a double exponential function in $n$. The necessary bounds are achieved by studying construction methods for gaussoids that rely on prescribing $3$-minors and encoding the resulting combinatorial constraints in a suitable transitive graph. Various special classes of gaussoids arise from restricting the allowed $3$-minors.
A gaussoid is a combinatorial structure that encodes independence in probability and statistics, just like matroids encode independence in linear algebra. The gaussoid axioms of Lnenicka and Matus are equivalent to compatibility with certain quadratic relations among principal and almost-principal minors of a symmetric matrix. We develop the geometric theory of gaussoids, based on the Lagrangian Grassmannian and its symmetries. We introduce oriented gaussoids and valuated gaussoids, thus connecting to real and tropical geometry. We classify small realizable and non-realizable gaussoids. Positive gaussoids are as nice as positroids: they are all realizable via graphical models.
Two separate statistical tests are described and developed in order to test un-binned data sets for adherence to the power-law form. The first test employs the TP-statistic, a function defined to deviate from zero when the sample deviates from the power-law form, regardless of the value of the power index. The second test employs a likelihood ratio test to reject a power-law background in favor of a model signal distribution with a cut-off.
Estimation of population size using incomplete lists (also called the capture-recapture problem) has a long history across many biological and social sciences. For example, human rights and other groups often construct partial and overlapping lists of victims of armed conflicts, with the hope of using this information to estimate the total number of victims. Earlier statistical methods for this setup either use potentially restrictive parametric assumptions, or else rely on typically suboptimal plug-in-type nonparametric estimators; however, both approaches can lead to substantial bias, the former via model misspecification and the latter via smoothing. Under an identifying assumption that two lists are conditionally independent given measured covariate information, we make several contributions. First, we derive the nonparametric efficiency bound for estimating the capture probability, which indicates the best possible performance of any estimator, and sheds light on the statistical limits of capture-recapture methods. Then we present a new estimator, and study its finite-sample properties, showing that it has a double robustness property new to capture-recapture, and that it is near-optimal in a non-asymptotic sense, under relatively mild nonparametric conditions. Next, we give a method for constructing confidence intervals for total population size from generic capture probability estimators, and prove non-asymptotic near-validity. Finally, we study our methods in simulations, and apply them to estimate the number of killings and disappearances attributable to different groups in Peru during its internal armed conflict between 1980 and 2000.
A neighborliness property of marginal polytopes of hierarchical models, depending on the cardinality of the smallest non-face of the underlying simplicial complex, is shown. The case of binary variables is studied explicitly, then the general case is reduced to the binary case. A Markov basis for binary hierarchical models whose simplicial complexes is the complement of an interval is given.
Let svec = (s_1,...,s_m) and tvec = (t_1,...,t_n) be vectors of nonnegative integer-valued functions of m,n with equal sum S = sum_{i=1}^m s_i = sum_{j=1}^n t_j. Let M(svec,tvec) be the number of m*n matrices with nonnegative integer entries such that the i-th row has row sum s_i and the j-th column has column sum t_j for all i,j. Such matrices occur in many different settings, an important example being the contingency tables (also called frequency tables) important in statistics. Define s=max_i s_i and t=max_j t_j. Previous work has established the asymptotic value of M(svec,tvec) as m,ntoinfty with s and t bounded (various authors independently, 1971-1974), and when svec,tvec are constant vectors with m/n,n/m,s/n >= c/log n for sufficiently large (Canfield and McKay, 2007). In this paper we extend the sparse range to the case st=o(S^(2/3)). The proof in part follows a previous asymptotic enumeration of 0-1 matrices under the same conditions (Greenhill, McKay and Wang, 2006). We also generalise the enumeration to matrices over any subset of the nonnegative integers that includes 0 and 1.