Bias-adjusted spectral clustering in multi-layer stochastic block models

131 0 0.0 ( 0 )

Download Cite

Added by Jing Lei

Publication date 2020

fields Mathematical Statistics

and research's language is English

Authors Jing Lei - Kevin Z. Lin

Statistics Theory Statistics Theory

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We consider the problem of estimating common community structures in multi-layer stochastic block models, where each single layer may not have sufficient signal strength to recover the full community structure. In order to efficiently aggregate signal across different layers, we argue that the sum-of-squared adjacency matrices contains sufficient signal even when individual layers are very sparse. Our method features a bias-removal step that is necessary when the squared noise matrices may overwhelm the signal in the very sparse regime. The analysis of our method uses several novel tail probability bounds for matrix linear combinations with matrix-valued coefficients and matrix-valued quadratic forms, which may be of independent interest. The performance of our method and the necessity of bias removal is demonstrated in synthetic data and in microarray analysis about gene co-expression networks.

rate research

Structure Amplification on Multi-layer Stochastic Block Models

167 - Xiaodong Xin , Kun He , Jialu Bao 2021

Much of the complexity of social, biological, and engineered systems arises from a network of complex interactions connecting many basic components. Network analysis tools have been successful at uncovering latent structure termed communities in such networks. However, some of the most interesting structure can be difficult to uncover because it is obscured by the more dominant structure. Our previous work proposes a general structure amplification technique called HICODE that uncovers many layers of functional hidden structure in complex networks. HICODE incrementally weakens dominant structure through randomization allowing the hidden functionality to emerge, and uncovers these hidden structure in real-world networks that previous methods rarely uncover. In this work, we conduct a comprehensive and systematic theoretical analysis on the hidden community structure. In what follows, we define multi-layer stochastic block model, and provide theoretical support using the model on why the existence of hidden structure will make the detection of dominant structure harder compared with equivalent random noise. We then provide theoretical proofs that the iterative reducing methods could help promote the uncovering of hidden structure as well as boosting the detection quality of dominant structure.

Social and Information Networks Machine Learning

Edge Label Inference in Generalized Stochastic Block Models: from Spectral Theory to Impossibility Results

136 - Jiaming Xu , Laurent Massoulie , Marc Lelarge 2014

The classical setting of community detection consists of networks exhibiting a clustered structure. To more accurately model real systems we consider a class of networks (i) whose edges may carry labels and (ii) which may lack a clustered structure. Specifically we assume that nodes possess latent attributes drawn from a general compact space and edges between two nodes are randomly generated and labeled according to some unknown distribution as a function of their latent attributes. Our goal is then to infer the edge label distributions from a partially observed network. We propose a computationally efficient spectral algorithm and show it allows for asymptotically correct inference when the average node degree could be as low as logarithmic in the total number of nodes. Conversely, if the average node degree is below a specific constant threshold, we show that no algorithm can achieve better inference than guessing without using the observations. As a byproduct of our analysis, we show that our model provides a general procedure to construct random graph models with a spectrum asymptotic to a pre-specified eigenvalue distribution such as a power-law distribution.

Statistics Theory Machine Learning Statistics Theory

Consistency of the maximum likelihood and variational estimators in a dynamic stochastic block model

68 - Lea Longepierre 2019

We consider a dynamic version of the stochastic block model, in which the nodes are partitioned into latent classes and the connection between two nodes is drawn from a Bernoulli distribution depending on the classes of these two nodes. The temporal evolution is modeled through a hidden Markov chain on the nodes memberships. We prove the consistency (as the number of nodes and time steps increase) of the maximum likelihood and variational estimators of the model parameters, and obtain upper bounds on the rates of convergence of these estimators. We also explore the particular case where the number of time steps is fixed and connectivity parameters are allowed to vary.

Statistics Theory Statistics Theory

Adjusted likelihood inference in an elliptical multivariate errors-in-variables model

150 - Tatiane F. N. Melo , Silvia L. P. Ferrari 2011

In this paper we obtain an adjusted version of the likelihood ratio test for errors-in-variables multivariate linear regression models. The error terms are allowed to follow a multivariate distribution in the class of the elliptical distributions, which has the multivariate normal distribution as a special case. We derive a modified likelihood ratio statistic that follows a chi-squared distribution with a high degree of accuracy. Our results generalize those in Melo and Ferrari(Advances in Statistical Analysis, 2010, 94, 75-87) by allowing the parameter of interest to be vector-valued in the multivariate errors-in-variables model. We report a simulation study which shows that the proposed test displays superior finite sample behavior relative to the standard likelihood ratio test.

Statistics Theory Statistics Theory

Estimation of Smooth Functionals in Normal Models: Bias Reduction and Asymptotic Efficiency

104 - Vladimir Koltchinskii , Mayya Zhilova 2019

Let $X_1,dots, X_n$ be i.i.d. random variables sampled from a normal distribution $N(mu,Sigma)$ in ${mathbb R}^d$ with unknown parameter $theta=(mu,Sigma)in Theta:={mathbb R}^dtimes {mathcal C}_+^d,$ where ${mathcal C}_+^d$ is the cone of positively definite covariance operators in ${mathbb R}^d.$ Given a smooth functional $f:Theta mapsto {mathbb R}^1,$ the goal is to estimate $f(theta)$ based on $X_1,dots, X_n.$ Let $$ Theta(a;d):={mathbb R}^dtimes Bigl{Sigmain {mathcal C}_+^d: sigma(Sigma)subset [1/a, a]Bigr}, ageq 1, $$ where $sigma(Sigma)$ is the spectrum of covariance $Sigma.$ Let $hat theta:=(hat mu, hat Sigma),$ where $hat mu$ is the sample mean and $hat Sigma$ is the sample covariance, based on the observations $X_1,dots, X_n.$ For an arbitrary functional $fin C^s(Theta),$ $s=k+1+rho, kgeq 0, rhoin (0,1],$ we define a functional $f_k:Theta mapsto {mathbb R}$ such that begin{align*} & sup_{thetain Theta(a;d)}|f_k(hat theta)-f(theta)|_{L_2({mathbb P}_{theta})} lesssim_{s, beta} |f|_{C^{s}(Theta)} biggr[biggl(frac{a}{sqrt{n}} bigvee a^{beta s}biggl(sqrt{frac{d}{n}}biggr)^{s} biggr)wedge 1biggr], end{align*} where $beta =1$ for $k=0$ and $beta>s-1$ is arbitrary for $kgeq 1.$ This error rate is minimax optimal and similar bounds hold for more general loss functions. If $d=d_nleq n^{alpha}$ for some $alphain (0,1)$ and $sgeq frac{1}{1-alpha},$ the rate becomes $O(n^{-1/2}).$ Moreover, for $s>frac{1}{1-alpha},$ the estimators $f_k(hat theta)$ is shown to be asymptotically efficient. The crucial part of the construction of estimator $f_k(hat theta)$ is a bias reduction method studied in the paper for more general statistical models than normal.

Statistics Theory Statistics Theory

comments

Fetching comments

Ebla Private University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Bias-adjusted spectral clustering in multi-layer stochastic block models

Ask ChatGPT about the research

No Arabic abstract

Read More