No Arabic abstract
This paper introduces a Nearly Unstable INteger-valued AutoRegressive Conditional Heteroskedasticity (NU-INARCH) process for dealing with count time series data. It is proved that a proper normalization of the NU-INARCH process endowed with a Skorohod topology weakly converges to a Cox-Ingersoll-Ross diffusion. The asymptotic distribution of the conditional least squares estimator of the correlation parameter is established as a functional of certain stochastic integrals. Numerical experiments based on Monte Carlo simulations are provided to verify the behavior of the asymptotic distribution under finite samples. These simulations reveal that the nearly unstable approach provides satisfactory and better results than those based on the stationarity assumption even when the true process is not that close to non-stationarity. A unit root test is proposed and its Type-I error and power are examined via Monte Carlo simulations. As an illustration, the proposed methodology is applied to the daily number of deaths due to COVID-19 in the United Kingdom.
We present a general framework for hypothesis testing on distributions of sets of individual examples. Sets may represent many common data sources such as groups of observations in time series, collections of words in text or a batch of images of a given phenomenon. This observation pattern, however, differs from the common assumptions required for hypothesis testing: each set differs in size, may have differing levels of noise, and also may incorporate nuisance variability, irrelevant for the analysis of the phenomenon of interest; all features that bias test decisions if not accounted for. In this paper, we propose to interpret sets as independent samples from a collection of latent probability distributions, and introduce kernel two-sample and independence tests in this latent space of distributions. We prove the consistency of tests and observe them to outperform in a wide range of synthetic experiments. Finally, we showcase their use in practice with experiments of healthcare and climate data, where previously heuristics were needed for feature extraction and testing.
We propose approaches for testing implementations of Markov Chain Monte Carlo methods as well as of general Monte Carlo methods. Based on statistical hypothesis tests, these approaches can be used in a unit testing framework to, for example, check if individual steps in a Gibbs sampler or a reversible jump MCMC have the desired invariant distribution. Two exact tests for assessing whether a given Markov chain has a specified invariant distribution are discussed. These and other tests of Monte Carlo methods can be embedded into a sequential method that allows low expected effort if the simulation shows the desired behavior and high power if it does not. Moreover, the false rejection probability can be kept arbitrarily low. For general Monte Carlo methods, this allows testing, for example, if a sampler has a specified distribution or if a sampler produces samples with the desired mean. The methods have been implemented in the R-package MCUnit.
In unit root testing, a piecewise locally stationary process is adopted to accommodate nonstationary errors that can have both smooth and abrupt changes in second- or higher-order properties. Under this framework, the limiting null distributions of the conventional unit root test statistics are derived and shown to contain a number of unknown parameters. To circumvent the difficulty of direct consistent estimation, we propose to use the dependent wild bootstrap to approximate the non-pivotal limiting null distributions and provide a rigorous theoretical justification for bootstrap consistency. The proposed method is compared through finite sample simulations with the recolored wild bootstrap procedure, which was developed for errors that follow a heteroscedastic linear process. Further, a combination of autoregressive sieve recoloring with the dependent wild bootstrap is shown to perform well. The validity of the dependent wild bootstrap in a nonstationary setting is demonstrated for the first time, showing the possibility of extensions to other inference problems associated with locally stationary processes.
We characterize completely the Gneiting class of space-time covariance functions and give more relaxed conditions on the involved functions. We then show necessary conditions for the construction of compactly supported functions of the Gneiting type. These conditions are very general since they do not depend on the Euclidean norm. Finally, we discuss a general class of positive definite functions, used for multivariate Gaussian random fields. For this class, we show necessary criteria for its generator to be compactly supported.
Modularity is a popular metric for quantifying the degree of community structure within a network. The distribution of the largest eigenvalue of a networks edge weight or adjacency matrix is well studied and is frequently used as a substitute for modularity when performing statistical inference. However, we show that the largest eigenvalue and modularity are asymptotically uncorrelated, which suggests the need for inference directly on modularity itself when the network size is large. To this end, we derive the asymptotic distributions of modularity in the case where the networks edge weight matrix belongs to the Gaussian Orthogonal Ensemble, and study the statistical power of the corresponding test for community structure under some alternative model. We empirically explore universality extensions of the limiting distribution and demonstrate the accuracy of these asymptotic distributions through type I error simulations. We also compare the empirical powers of the modularity based tests with some existing methods. Our method is then used to test for the presence of community structure in two real data applications.