Spectral goodness-of-fit tests for complete and partial network data

149 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Tyler McCormick

تاريخ النشر 2021

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Shane Lubold - Bolun Liu - Tyler H. McCormick

المنهجية التعلم الآلي الشبكات الاجتماعية والمعلومات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Networks describe the, often complex, relationships between individual actors. In this work, we address the question of how to determine whether a parametric model, such as a stochastic block model or latent space model, fits a dataset well and will extrapolate to similar data. We use recent results in random matrix theory to derive a general goodness-of-fit test for dyadic data. We show that our method, when applied to a specific model of interest, provides an straightforward, computationally fast way of selecting parameters in a number of commonly used network models. For example, we show how to select the dimension of the latent space in latent space models. Unlike other network goodness-of-fit methods, our general approach does not require simulating from a candidate parametric model, which can be cumbersome with large graphs, and eliminates the need to choose a particular set of statistics on the graph for comparison. It also allows us to perform goodness-of-fit tests on partial network data, such as Aggregated Relational Data. We show with simulations that our method performs well in many situations of interest. We analyze several empirically relevant networks and show that our method leads to improved community detection algorithms. R code to implement our method is available on Github.

قيم البحث

اقرأ أيضاً

Data-driven goodness-of-fit tests

548 - Mikhail Langovoy 2017

We propose and study a general method for construction of consistent statistical tests on the basis of possibly indirect, corrupted, or partially available observations. The class of tests devised in the paper contains Neymans smooth tests, data-driv en score tests, and some types of multi-sample tests as basic examples. Our tests are data-driven and are additionally incorporated with model selection rules. The method allows to use a wide class of model selection rules that are based on the penalization idea. In particular, many of the optimal penalties, derived in statistical literature, can be used in our tests. We establish the behavior of model selection rules and data-driven tests under both the null hypothesis and the alternative hypothesis, derive an explicit detectability rule for alternative hypotheses, and prove a master consistency theorem for the tests from the class. The paper shows that the tests are applicable to a wide range of problems, including hypothesis testing in statistical inverse problems, multi-sample problems, and nonparametric hypothesis testing.

نظرية الإحصاء الاحتمالات المنهجية

Interpretable Stein Goodness-of-fit Tests on Riemannian Manifolds

148 - Wenkai Xu , Takeru Matsuda 2021

In many applications, we encounter data on Riemannian manifolds such as torus and rotation groups. Standard statistical procedures for multivariate data are not applicable to such data. In this study, we develop goodness-of-fit testing and interpreta ble model criticism methods for general distributions on Riemannian manifolds, including those with an intractable normalization constant. The proposed methods are based on extensions of kernel Stein discrepancy, which are derived from Stein operators on Riemannian manifolds. We discuss the connections between the proposed tests with existing ones and provide a theoretical analysis of their asymptotic Bahadur efficiency. Simulation results and real data applications show the validity of the proposed methods.

المنهجية

Kernelized Stein Discrepancy Tests of Goodness-of-fit for Time-to-Event Data

199 - Tamara Fernandez , Nicolas Rivera , Wenkai Xu 2020

Survival Analysis and Reliability Theory are concerned with the analysis of time-to-event data, in which observations correspond to waiting times until an event of interest such as death from a particular disease or failure of a component in a mechan ical system. This type of data is unique due to the presence of censoring, a type of missing data that occurs when we do not observe the actual time of the event of interest but, instead, we have access to an approximation for it given by random interval in which the observation is known to belong. Most traditional methods are not designed to deal with censoring, and thus we need to adapt them to censored time-to-event data. In this paper, we focus on non-parametric goodness-of-fit testing procedures based on combining the Steins method and kernelized discrepancies. While for uncensored data, there is a natural way of implementing a kernelized Stein discrepancy test, for censored data there are several options, each of them with different advantages and disadvantages. In this paper, we propose a collection of kernelized Stein discrepancy tests for time-to-event data, and we study each of them theoretically and empirically; our experimental results show that our proposed methods perform better than existing tests, including previous tests based on a kernelized maximum mean discrepancy.

التعلم الالي التعلم الآلي المنهجية

A maximum-mean-discrepancy goodness-of-fit test for censored data

203 - Tamara Fernandez , Arthur Gretton 2018

We introduce a kernel-based goodness-of-fit test for censored data, where observations may be missing in random time intervals: a common occurrence in clinical trials and industrial life-testing. The test statistic is straightforward to compute, as i s the test threshold, and we establish consistency under the null. Unlike earlier approaches such as the Log-rank test, we make no assumptions as to how the data distribution might differ from the null, and our test has power against a very rich class of alternatives. In experiments, our test outperforms competing approaches for periodic and Weibull hazard functions (where risks are time dependent), and does not show the failure modes of tests that rely on user-defined features. Moreover, in cases where classical tests are provably most powerful, our test performs almost as well, while being more general.

المنهجية

A Goodness-of-Fit Test for Statistical Models

85 - Hangjin Jiang 2020

Statistical modeling plays a fundamental role in understanding the underlying mechanism of massive data (statistical inference) and predicting the future (statistical prediction). Although all models are wrong, researchers try their best to make some of them be useful. The question here is how can we measure the usefulness of a statistical model for the data in hand? This is key to statistical prediction. The important statistical problem of testing whether the observations follow the proposed statistical model has only attracted relatively few attentions. In this paper, we proposed a new framework for this problem through building its connection with two-sample distribution comparison. The proposed method can be applied to evaluate a wide range of models. Examples are given to show the performance of the proposed method.

المنهجية تطبيقات الإحصاء