No Arabic abstract
Inference on vertex-aligned graphs is of wide theoretical and practical importance.There are, however, few flexible and tractable statistical models for correlated graphs, and even fewer comprehensive approaches to parametric inference on data arising from such graphs. In this paper, we consider the correlated Bernoulli random graph model (allowing different Bernoulli coefficients and edge correlations for different pairs of vertices), and we introduce a new variance-reducing technique -- called emph{balancing} -- that can refine estimators for model parameters. Specifically, we construct a disagreement statistic and show that it is complete and sufficient; balancing can be interpreted as Rao-Blackwellization with this disagreement statistic. We show that for unbiased estimators of functions of model parameters, balancing generates uniformly minimum variance unbiased estimators (UMVUEs). However, even when unbiased estimators for model parameters do {em not} exist -- which, as we prove, is the case with both the heterogeneity correlation and the total correlation parameters -- balancing is still useful, and lowers mean squared error. In particular, we demonstrate how balancing can improve the efficiency of the alignment strength estimator for the total correlation, a parameter that plays a critical role in graph matchability and graph matching runtime complexity.
Unmeasured confounding is a threat to causal inference and individualized decision making. Similar to Cui and Tchetgen Tchetgen (2020); Qiu et al. (2020); Han (2020a), we consider the problem of identification of optimal individualized treatment regimes with a valid instrumental variable. Han (2020a) provided an alternative identifying condition of optimal treatment regimes using the conditional Wald estimand of Cui and Tchetgen Tchetgen (2020); Qiu et al. (2020) when treatment assignment is subject to endogeneity and a valid binary instrumental variable is available. In this note, we provide a necessary and sufficient condition for identification of optimal treatment regimes using the conditional Wald estimand. Our novel condition is necessarily implied by those of Cui and Tchetgen Tchetgen (2020); Qiu et al. (2020); Han (2020a) and may continue to hold in a variety of potential settings not covered by prior results.
The aim of online monitoring is to issue an alarm as soon as there is significant evidence in the collected observations to suggest that the underlying data generating mechanism has changed. This work is concerned with open-end, nonparametric procedures that can be interpreted as statistical tests. The proposed monitoring schemes consist of computing the so-called retrospective CUSUM statistic (or minor variations thereof) after the arrival of each new observation. After proposing suitable threshold functions for the chosen detectors, the asymptotic validity of the procedures is investigated in the special case of monitoring for changes in the mean, both under the null hypothesis of stationarity and relevant alternatives. To carry out the sequential tests in practice, an approach based on an asymptotic regression model is used to estimate high quantiles of relevant limiting distributions. Monte Carlo experiments demonstrate the good finite-sample behavior of the proposed monitoring schemes and suggest that they are superior to existing competitors as long as changes do not occur at the very beginning of the monitoring. Extensions to statistics exhibiting an asymptotic mean-like behavior are briefly discussed. Finally, the application of the derived sequential change-point detection tests is succinctly illustrated on temperature anomaly data.
Optimal linear prediction (also known as kriging) of a random field ${Z(x)}_{xinmathcal{X}}$ indexed by a compact metric space $(mathcal{X},d_{mathcal{X}})$ can be obtained if the mean value function $mcolonmathcal{X}tomathbb{R}$ and the covariance function $varrhocolonmathcal{X}timesmathcal{X}tomathbb{R}$ of $Z$ are known. We consider the problem of predicting the value of $Z(x^*)$ at some location $x^*inmathcal{X}$ based on observations at locations ${x_j}_{j=1}^n$ which accumulate at $x^*$ as $ntoinfty$ (or, more generally, predicting $varphi(Z)$ based on ${varphi_j(Z)}_{j=1}^n$ for linear functionals $varphi, varphi_1, ldots, varphi_n$). Our main result characterizes the asymptotic performance of linear predictors (as $n$ increases) based on an incorrect second order structure $(tilde{m},tilde{varrho})$, without any restrictive assumptions on $varrho, tilde{varrho}$ such as stationarity. We, for the first time, provide necessary and sufficient conditions on $(tilde{m},tilde{varrho})$ for asymptotic optimality of the corresponding linear predictor holding uniformly with respect to $varphi$. These general results are illustrated by weakly stationary random fields on $mathcal{X}subsetmathbb{R}^d$ with Matern or periodic covariance functions, and on the sphere $mathcal{X}=mathbb{S}^2$ for the case of two isotropic covariance functions.
The infinite-dimensional Hilbert sphere $S^infty$ has been widely employed to model density functions and shapes, extending the finite-dimensional counterpart. We consider the Frechet mean as an intrinsic summary of the central tendency of data lying on $S^infty$. To break a path for sound statistical inference, we derive properties of the Frechet mean on $S^infty$ by establishing its existence and uniqueness as well as a root-$n$ central limit theorem (CLT) for the sample version, overcoming obstructions from infinite-dimensionality and lack of compactness on $S^infty$. Intrinsic CLTs for the estimated tangent vectors and covariance operator are also obtained. Asymptotic and bootstrap hypothesis tests for the Frechet mean based on projection and norm are then proposed and are shown to be consistent. The proposed two-sample tests are applied to make inference for daily taxi demand patterns over Manhattan modeled as densities, of which the square roots are analyzed on the Hilbert sphere. Numerical properties of the proposed hypothesis tests which utilize the spherical geometry are studied in the real data application and simulations, where we demonstrate that the tests based on the intrinsic geometry compare favorably to those based on an extrinsic or flat geometry.
We propose a one-step procedure to estimate the latent positions in random dot product graphs efficiently. Unlike the classical spectral-based methods such as the adjacency and Laplacian spectral embedding, the proposed one-step procedure takes advantage of both the low-rank structure of the expected adjacency matrix and the Bernoulli likelihood information of the sampling model simultaneously. We show that for each vertex, the corresponding row of the one-step estimator converges to a multivariate normal distribution after proper scaling and centering up to an orthogonal transformation, with an efficient covariance matrix. The initial estimator for the one-step procedure needs to satisfy the so-called approximate linearization property. The one-step estimator improves the commonly-adopted spectral embedding methods in the following sense: Globally for all vertices, it yields an asymptotic sum of squares error no greater than those of the spectral methods, and locally for each vertex, the asymptotic covariance matrix of the corresponding row of the one-step estimator dominates those of the spectral embeddings in spectra. The usefulness of the proposed one-step procedure is demonstrated via numerical examples and the analysis of a real-world Wikipedia graph dataset.