No Arabic abstract
The classical setting of community detection consists of networks exhibiting a clustered structure. To more accurately model real systems we consider a class of networks (i) whose edges may carry labels and (ii) which may lack a clustered structure. Specifically we assume that nodes possess latent attributes drawn from a general compact space and edges between two nodes are randomly generated and labeled according to some unknown distribution as a function of their latent attributes. Our goal is then to infer the edge label distributions from a partially observed network. We propose a computationally efficient spectral algorithm and show it allows for asymptotically correct inference when the average node degree could be as low as logarithmic in the total number of nodes. Conversely, if the average node degree is below a specific constant threshold, we show that no algorithm can achieve better inference than guessing without using the observations. As a byproduct of our analysis, we show that our model provides a general procedure to construct random graph models with a spectrum asymptotic to a pre-specified eigenvalue distribution such as a power-law distribution.
We consider the problem of estimating common community structures in multi-layer stochastic block models, where each single layer may not have sufficient signal strength to recover the full community structure. In order to efficiently aggregate signal across different layers, we argue that the sum-of-squared adjacency matrices contains sufficient signal even when individual layers are very sparse. Our method features a bias-removal step that is necessary when the squared noise matrices may overwhelm the signal in the very sparse regime. The analysis of our method uses several novel tail probability bounds for matrix linear combinations with matrix-valued coefficients and matrix-valued quadratic forms, which may be of independent interest. The performance of our method and the necessity of bias removal is demonstrated in synthetic data and in microarray analysis about gene co-expression networks.
We consider stochastic systems of interacting particles or agents, with dynamics determined by an interaction kernel which only depends on pairwise distances. We study the problem of inferring this interaction kernel from observations of the positions of the particles, in either continuous or discrete time, along multiple independent trajectories. We introduce a nonparametric inference approach to this inverse problem, based on a regularized maximum likelihood estimator constrained to suitable hypothesis spaces adaptive to data. We show that a coercivity condition enables us to control the condition number of this problem and prove the consistency of our estimator, and that in fact it converges at a near-optimal learning rate, equal to the min-max rate of $1$-dimensional non-parametric regression. In particular, this rate is independent of the dimension of the state space, which is typically very high. We also analyze the discretization errors in the case of discrete-time observations, showing that it is of order $1/2$ in terms of the time gaps between observations. This term, when large, dominates the sampling error and the approximation error, preventing convergence of the estimator. Finally, we exhibit an efficient parallel algorithm to construct the estimator from data, and we demonstrate the effectiveness of our algorithm with numerical tests on prototype systems including stochastic opinion dynamics and a Lennard-Jones model.
Modelling edge weights play a crucial role in the analysis of network data, which reveals the extent of relationships among individuals. Due to the diversity of weight information, sharing these data has become a complicated challenge in a privacy-preserving way. In this paper, we consider the case of the non-denoising process to achieve the trade-off between privacy and weight information in the generalized $beta$-model. Under the edge differential privacy with a discrete Laplace mechanism, the Z-estimators from estimating equations for the model parameters are shown to be consistent and asymptotically normally distributed. The simulations and a real data example are given to further support the theoretical results.
A general class of time-varying regression models is considered in this paper. We estimate the regression coefficients by using local linear M-estimation. For these estimators, weak Bahadur representations are obtained and are used to construct simultaneous confidence bands. For practical implementation, we propose a bootstrap based method to circumvent the slow logarithmic convergence of the theoretical simultaneous bands. Our results substantially generalize and unify the treatments for several time-varying regression and auto-regression models. The performance for ARCH and GARCH models is studied in simulations and a few real-life applications of our study are presented through analysis of some popular financial datasets.
Additive models, as a natural generalization of linear regression, have played an important role in studying nonlinear relationships. Despite of a rich literature and many recent advances on the topic, the statistical inference problem in additive models is still relatively poorly understood. Motivated by the inference for the exposure effect and other applications, we tackle in this paper the statistical inference problem for $f_1(x_0)$ in additive models, where $f_1$ denotes the univariate function of interest and $f_1(x_0)$ denotes its first order derivative evaluated at a specific point $x_0$. The main challenge for this local inference problem is the understanding and control of the additional uncertainty due to the need of estimating other components in the additive model as nuisance functions. To address this, we propose a decorrelated local linear estimator, which is particularly useful in reducing the effect of the nuisance function estimation error on the estimation accuracy of $f_1(x_0)$. We establish the asymptotic limiting distribution for the proposed estimator and then construct confidence interval and hypothesis testing procedures for $f_1(x_0)$. The variance level of the proposed estimator is of the same order as that of the local least squares in nonparametric regression, or equivalently the additive model with one component, while the bias of the proposed estimator is jointly determined by the statistical accuracies in estimating the nuisance functions and the relationship between the variable of interest and the nuisance variables. The method is developed for general additive models and is demonstrated in the high-dimensional sparse setting.