No Arabic abstract
Any modern network inference paradigm must incorporate multiple aspects of network structure, including information that is often encoded both in vertices and in edges. Methodology for handling vertex attributes has been developed for a number of network models, but comparable techniques for edge-related attributes remain largely unavailable. We address this gap in the literature by extending the latent position random graph model to the line graph of a random graph, which is formed by creating a vertex for each edge in the original random graph, and connecting each pair of edges incident to a common vertex in the original graph. We prove concentration inequalities for the spectrum of a line graph, and then establish that although naive spectral decompositions can fail to extract necessary signal for edge clustering, there exist signal-preserving singular subspaces of the line graph that can be recovered through a carefully-chosen projection. Moreover, we can consistently estimate edge latent positions in a random line graph, even though such graphs are of a random size, typically have high rank, and possess no spectral gap. Our results also demonstrate that the line graph of a stochastic block model exhibits underlying block structure, and we synthesize and test our methods in simulations for cluster recovery and edge covariate inference in stochastic block model graphs.
This contribution gives an extensive study on spectra of mixed graphs via its Hermitian adjacency matrix of the second kind introduced by Mohar [21]. This matrix is indexed by the vertices of the mixed graph, and the entry corresponding to an arc from $u$ to $v$ is equal to the sixth root of unity $omega=frac{1+{bf i}sqrt{3}}{2}$ (and its symmetric entry is $overline{omega}=frac{1-{bf i}sqrt{3}}{2}$); the entry corresponding to an undirected edge is equal to 1, and 0 otherwise. The main results of this paper include the following: Some interesting properties are discovered about the characteristic polynomial of this novel matrix. Cospectral problems among mixed graphs, including mixed graphs and their underlying graphs, are studied. We give equivalent conditions for a mixed graph that shares the same spectrum of its Hermitian adjacency matrix of the second kind ($H_S$-spectrum for short) with its underlying graph. A sharp upper bound on the $H_S$-spectral radius is established and the corresponding extremal mixed graphs are identified. Operations which are called three-way switchings are discussed--they give rise to a large number of $H_S$-cospectral mixed graphs. We extract all the mixed graphs whose rank of its Hermitian adjacency matrix of the second kind ($H_S$-rank for short) is $2$ (resp. 3). Furthermore, we show that all connected mixed graphs with $H_S$-rank $2$ can be determined by their $H_S$-spectrum. However, this does not hold for all connected mixed graphs with $H_S$-rank $3$. We identify all mixed graphs whose eigenvalues of its Hermitian adjacency matrix of the second kind ($H_S$-eigenvalues for short) lie in the range $(-alpha,, alpha)$ for $alphainleft{sqrt{2},,sqrt{3},,2right}$.
Generating large synthetic attributed graphs with node labels is an important task to support various experimental studies for graph analysis methods. Existing graph generators fail to simultaneously simulate the relationships between labels, attributes, and topology which real-world graphs exhibit. Motivated by this limitation, we propose GenCAT, an attributed graph generator for controlling those relationships, which has the following advantages. (i) GenCAT generates graphs with user-specified node degrees and flexibly controls the relationship between nodes and labels by incorporating the connection proportion for each node to classes. (ii) Generated attribute values follow user-specified distributions, and users can flexibly control the correlation between the attributes and labels. (iii) Graph generation scales linearly to the number of edges. GenCAT is the first generator to support all three of these practical features. Through extensive experiments, we demonstrate that GenCAT can efficiently generate high-quality complex attributed graphs with user-controlled relationships between labels, attributes, and topology.
In network science, assortativity refers to the tendency of links to exist between nodes with similar attributes. In social networks, for example, links tend to exist between individuals of similar age, nationality, location, race, income, educational level, religious belief, and language. Thus, various attributes jointly affect the network topology. An interesting problem is to detect community structure beyond some specific assortativity-related attributes $rho$, i.e., to take out the effect of $rho$ on network topology and reveal the hidden community structure which are due to other attributes. An approach to this problem is to redefine the null model of the modularity measure, so as to simulate the effect of $rho$ on network topology. However, a challenge is that we do not know to what extent the network topology is affected by $rho$ and by other attributes. In this paper, we propose Dist-Modularity which allows us to freely choose any suitable function to simulate the effect of $rho$. Such freedom can help us probe the effect of $rho$ and detect the hidden communities which are due to other attributes. We test the effectiveness of Dist-Modularity on synthetic benchmarks and two real-world networks.
Common models for random graphs, such as ErdH{o}s-R{e}nyi and Kronecker graphs, correspond to generating random adjacency matrices where each entry is non-zero based on a large matrix of probabilities. Generating an instance of a random graph based on these models is easy, although inefficient, by flipping biased coins (i.e. sampling binomial random variables) for each possible edge. This process is inefficient because most large graph models correspond to sparse graphs where the vast majority of coin flips will result in no edges. We describe some not-entirely-well-known, but not-entirely-unknown, techniques that will enable us to sample a graph by finding only the coin flips that will produce edges. Our analogies for these procedures are ball-dropping, which is easier to implement, but may need extra work due to duplicate edges, and grass-hopping, which results in no duplicated work or extra edges. Grass-hopping does this using geometric random variables. In order to use this idea on complex probability matrices such as those in Kronecker graphs, we decompose the problem into three steps, each of which are independently useful computational primitives: (i) enumerating non-decreasing sequences, (ii) unranking multiset permutations, and (iii) decoding and encoding z-curve and Morton codes and permutations. The third step is the result of a new connection between repeated Kronecker product operations and Morton codes. Throughout, we draw connections to ideas underlying applied math and computer science including coupon collector problems.
The infinite-dimensional Hilbert sphere $S^infty$ has been widely employed to model density functions and shapes, extending the finite-dimensional counterpart. We consider the Frechet mean as an intrinsic summary of the central tendency of data lying on $S^infty$. To break a path for sound statistical inference, we derive properties of the Frechet mean on $S^infty$ by establishing its existence and uniqueness as well as a root-$n$ central limit theorem (CLT) for the sample version, overcoming obstructions from infinite-dimensionality and lack of compactness on $S^infty$. Intrinsic CLTs for the estimated tangent vectors and covariance operator are also obtained. Asymptotic and bootstrap hypothesis tests for the Frechet mean based on projection and norm are then proposed and are shown to be consistent. The proposed two-sample tests are applied to make inference for daily taxi demand patterns over Manhattan modeled as densities, of which the square roots are analyzed on the Hilbert sphere. Numerical properties of the proposed hypothesis tests which utilize the spherical geometry are studied in the real data application and simulations, where we demonstrate that the tests based on the intrinsic geometry compare favorably to those based on an extrinsic or flat geometry.