No Arabic abstract
Community detection has been well studied in network analysis, and one popular technique is spectral clustering which is fast and statistically analyzable for detect-ing clusters for given networks. But the more realistic case of mixed membership community detection remains a challenge. In this paper, we propose a new spectral clustering method Mixed-SLIM for mixed membership community detection. Mixed-SLIM is designed based on the symmetrized Laplacian inverse matrix (SLIM) (Jing et al. 2021) under the degree-corrected mixed membership (DCMM) model. We show that this algorithm and its regularized version Mixed-SLIM {tau} are asymptotically consistent under mild conditions. Meanwhile, we provide Mixed-SLIM appro and its regularized version Mixed-SLIM {tau}appro by approximating the SLIM matrix when dealing with large networks in practice. These four Mixed-SLIM methods outperform state-of-art methods in simulations and substantial empirical datasets for both community detection and mixed membership community detection problems.
This paper considers the problem of modeling and estimating community memberships of nodes in a directed network where every row (column) node is associated with a vector determining its membership in each row (column) community. To model such directed network, we propose directed degree corrected mixed membership (DiDCMM) model by considering degree heterogeneity. DiDCMM is identifiable under popular conditions for mixed membership network when considering degree heterogeneity. Based on the cone structure inherent in the normalized version of the left singular vectors and the simplex structure inherent in the right singular vectors of the population adjacency matrix, we build an efficient algorithm called DiMSC to infer the community membership vectors for both row nodes and column nodes. By taking the advantage of DiMSCs equivalence algorithm which returns same estimations as DiMSC and the recent development on row-wise singular vector deviation, we show that the proposed algorithm is asymptotically consistent under mild conditions by providing error bounds for the inferred membership vectors of each row node and each column node under DiDCMM. The theory is supplemented by a simulation study.
Mixed linear regression (MLR) model is among the most exemplary statistical tools for modeling non-linear distributions using a mixture of linear models. When the additive noise in MLR model is Gaussian, Expectation-Maximization (EM) algorithm is a widely-used algorithm for maximum likelihood estimation of MLR parameters. However, when noise is non-Gaussian, the steps of EM algorithm may not have closed-form update rules, which makes EM algorithm impractical. In this work, we study the maximum likelihood estimation of the parameters of MLR model when the additive noise has non-Gaussian distribution. In particular, we consider the case that noise has Laplacian distribution and we first show that unlike the the Gaussian case, the resulting sub-problems of EM algorithm in this case does not have closed-form update rule, thus preventing us from using EM in this case. To overcome this issue, we propose a new algorithm based on combining the alternating direction method of multipliers (ADMM) with EM algorithm idea. Our numerical experiments show that our method outperforms the EM algorithm in statistical accuracy and computational time in non-Gaussian noise case.
We consider the matrix completion problem of recovering a structured low rank matrix with partially observed entries with mixed data types. Vast majority of the solutions have proposed computationally feasible estimators with strong statistical guarantees for the case where the underlying distribution of data in the matrix is continuous. A few recent approaches have extended using similar ideas these estimators to the case where the underlying distributions belongs to the exponential family. Most of these approaches assume that there is only one underlying distribution and the low rank constraint is regularized by the matrix Schatten Norm. We propose a computationally feasible statistical approach with strong recovery guarantees along with an algorithmic framework suited for parallelization to recover a low rank matrix with partially observed entries for mixed data types in one step. We also provide extensive simulation evidence that corroborate our theoretical results.
In this paper we derive the inverse spatial Laplacian for static, spherically symmetric backgrounds by solving Poissons equation for a point source. This is different from the electrostatic Green function, which is defined on the four dimensional static spacetime, while the equation we consider is defined on the spatial hypersurface of such spacetimes. This Green function is relevant in the Hamiltonian dynamics of theories defined on spherically symmetric backgrounds, and closed form expressions for the solutions we find are absent in the literature. We derive an expression in terms of elementary functions for the Schwarzschild spacetime, and comment on the relation of this solution with the known Green function of the spacetime Laplacian operator. We also find an expression for the Green function on the static pure de Sitter space in terms of hypergeometric functions.
Outlier detection methods have become increasingly relevant in recent years due to increased security concerns and because of its vast application to different fields. Recently, Pauwels and Lasserre (2016) noticed that the sublevel sets of the inverse Christoffel function accurately depict the shape of a cloud of data using a sum-of-squares polynomial and can be used to perform outlier detection. In this work, we propose a kernelized variant of the inverse Christoffel function that makes it computationally tractable for data sets with a large number of features. We compare our approach to current methods on 15 different data sets and achieve the best average area under the precision recall curve (AUPRC) score, the best average rank and the lowest root mean square deviation.