No Arabic abstract
We propose a one-step procedure to estimate the latent positions in random dot product graphs efficiently. Unlike the classical spectral-based methods such as the adjacency and Laplacian spectral embedding, the proposed one-step procedure takes advantage of both the low-rank structure of the expected adjacency matrix and the Bernoulli likelihood information of the sampling model simultaneously. We show that for each vertex, the corresponding row of the one-step estimator converges to a multivariate normal distribution after proper scaling and centering up to an orthogonal transformation, with an efficient covariance matrix. The initial estimator for the one-step procedure needs to satisfy the so-called approximate linearization property. The one-step estimator improves the commonly-adopted spectral embedding methods in the following sense: Globally for all vertices, it yields an asymptotic sum of squares error no greater than those of the spectral methods, and locally for each vertex, the asymptotic covariance matrix of the corresponding row of the one-step estimator dominates those of the spectral embeddings in spectra. The usefulness of the proposed one-step procedure is demonstrated via numerical examples and the analysis of a real-world Wikipedia graph dataset.
We propose a Bayesian approach, called the posterior spectral embedding, for estimating the latent positions in random dot product graphs, and prove its optimality. Unlike the classical spectral-based adjacency/Laplacian spectral embedding, the posterior spectral embedding is a fully-likelihood based graph estimation method taking advantage of the Bernoulli likelihood information of the observed adjacency matrix. We develop a minimax-lower bound for estimating the latent positions, and show that the posterior spectral embedding achieves this lower bound since it both results in a minimax-optimal posterior contraction rate, and yields a point estimator achieving the minimax risk asymptotically. The convergence results are subsequently applied to clustering in stochastic block models, the result of which strengthens an existing result concerning the number of mis-clustered vertices. We also study a spectral-based Gaussian spectral embedding as a natural Bayesian analogy of the adjacency spectral embedding, but the resulting posterior contraction rate is sub-optimal with an extra logarithmic factor. The practical performance of the proposed methodology is illustrated through extensive synthetic examples and the analysis of a Wikipedia graph data.
We develop a Nonparametric Empirical Bayes (NEB) framework for compound estimation in the discrete linear exponential family, which includes a wide class of discrete distributions frequently arising from modern big data applications. We propose to directly estimate the Bayes shrinkage factor in the generalized Robbins formula via solving a scalable convex program, which is carefully developed based on a RKHS representation of the Steins discrepancy measure. The new NEB estimation framework is flexible for incorporating various structural constraints into the data driven rule, and provides a unified approach to compound estimation with both regular and scaled squared error losses. We develop theory to show that the class of NEB estimators enjoys strong asymptotic properties. Comprehensive simulation studies as well as analyses of real data examples are carried out to demonstrate the superiority of the NEB estimator over competing methods.
In this paper we consider the drift estimation problem for a general differential equation driven by an additive multidimensional fractional Brownian motion, under ergodic assumptions on the drift coefficient. Our estimation procedure is based on the identification of the invariant measure, and we provide consistency results as well as some information about the convergence rate. We also give some examples of coefficients for which the identifiability assumption for the invariant measure is satisfied.
Inference on vertex-aligned graphs is of wide theoretical and practical importance.There are, however, few flexible and tractable statistical models for correlated graphs, and even fewer comprehensive approaches to parametric inference on data arising from such graphs. In this paper, we consider the correlated Bernoulli random graph model (allowing different Bernoulli coefficients and edge correlations for different pairs of vertices), and we introduce a new variance-reducing technique -- called emph{balancing} -- that can refine estimators for model parameters. Specifically, we construct a disagreement statistic and show that it is complete and sufficient; balancing can be interpreted as Rao-Blackwellization with this disagreement statistic. We show that for unbiased estimators of functions of model parameters, balancing generates uniformly minimum variance unbiased estimators (UMVUEs). However, even when unbiased estimators for model parameters do {em not} exist -- which, as we prove, is the case with both the heterogeneity correlation and the total correlation parameters -- balancing is still useful, and lowers mean squared error. In particular, we demonstrate how balancing can improve the efficiency of the alignment strength estimator for the total correlation, a parameter that plays a critical role in graph matchability and graph matching runtime complexity.
This paper deals with the dimension reduction for high-dimensional time series based on common factors. In particular we allow the dimension of time series $p$ to be as large as, or even larger than, the sample size $n$. The estimation for the factor loading matrix and the factor process itself is carried out via an eigenanalysis for a $ptimes p$ non-negative definite matrix. We show that when all the factors are strong in the sense that the norm of each column in the factor loading matrix is of the order $p^{1/2}$, the estimator for the factor loading matrix, as well as the resulting estimator for the precision matrix of the original $p$-variant time series, are weakly consistent in $L_2$-norm with the convergence rates independent of $p$. This result exhibits clearly that the `curse is canceled out by the `blessings in dimensionality. We also establish the asymptotic properties of the estimation when not all factors are strong. For the latter case, a two-step estimation procedure is preferred accordingly to the asymptotic theory. The proposed methods together with their asymptotic properties are further illustrated in a simulation study. An application to a real data set is also reported.