ترغب بنشر مسار تعليمي؟ اضغط هنا

Local graph clustering and the closely related seed set expansion problem are primitives on graphs that are central to a wide range of analytic and learning tasks such as local clustering, community detection, nodes ranking and feature inference. Pri or work on local graph clustering mostly falls into two categories with numerical and combinatorial roots respectively. In this work, we draw inspiration from both fields and propose a family of convex optimization formulations based on the idea of diffusion with p-norm network flow for $pin (1,infty)$. In the context of local clustering, we characterize the optimal solutions for these optimization problems and show their usefulness in finding low conductance cuts around input seed set. In particular, we achieve quadratic approximation of conductance in the case of $p=2$ similar to the Cheeger-type bounds of spectral methods, constant factor approximation when $prightarrowinfty$ similar to max-flow based methods, and a smooth transition for general $p$ values in between. Thus, our optimization formulation can be viewed as bridging the numerical and combinatorial approaches, and we can achieve the best of both worlds in terms of speed and noise robustness. We show that the proposed problem can be solved in strongly local running time for $pge 2$ and conduct empirical evaluations on both synthetic and real-world graphs to illustrate our approach compares favorably with existing methods.
Local graph clustering methods aim to find small clusters in very large graphs. These methods take as input a graph and a seed node, and they return as output a good cluster in a running time that depends on the size of the output cluster but that is independent of the size of the input graph. In this paper, we adopt a statistical perspective on local graph clustering, and we analyze the performance of the l1-regularized PageRank method~(Fountoulakis et. al.) for the recovery of a single target cluster, given a seed node inside the cluster. Assuming the target cluster has been generated by a random model, we present two results. In the first, we show that the optimal support of l1-regularized PageRank recovers the full target cluster, with bounded false positives. In the second, we show that if the seed node is connected solely to the target cluster then the optimal support of l1-regularized PageRank recovers exactly the target cluster. We also show empirically that l1-regularized PageRank has a state-of-the-art performance on many real graphs, demonstrating the superiority of the method. From a computational perspective, we show that the solution path of l1-regularized PageRank is monotonic. This allows for the application of the forward stagewise algorithm, which approximates the solution path in running time that does not depend on the size of the whole graph. Finally, we show that l1-regularized PageRank and approximate personalized PageRank (APPR), another very popular method for local graph clustering, are equivalent in the sense that we can lower and upper bound the output of one with the output of the other. Based on this relation, we establish for APPR similar results to those we establish for l1-regularized PageRank.
Graph clustering has many important applications in computing, but due to the increasing sizes of graphs, even traditionally fast clustering methods can be computationally expensive for real-world graphs of interest. Scalability problems led to the d evelopment of local graph clustering algorithms that come with a variety of theoretical guarantees. Rather than return a global clustering of the entire graph, local clustering algorithms return a single cluster around a given seed node or set of seed nodes. These algorithms improve scalability because they use time and memory resources that depend only on the size of the cluster returned, instead of the size of the input graph. Indeed, for many of them, their running time grows linearly with the size of the output. In addition to scalability arguments, local graph clustering algorithms have proven to be very useful for identifying and interpreting small-scale and meso-scale structure in large-scale graphs. As opposed to heuristic operational procedures, this class of algorithms comes with strong algorithmic and statistical theory. These include statistical guarantees that prove they have implicit regularization properties. One of the challenges with the existing literature on these approaches is that they are published in a wide variety of areas, including theoretical computer science, statistics, data science, and mathematics. This has made it difficult to relate the various algorithms and ideas together into a cohesive whole. We have recently been working on unifying these diverse perspectives through the lens of optimization as well as providing software to perform these computations in a cohesive fashion. In this note, we provide a brief introduction to local graph clustering, we provide some representative examples of our perspective, and we introduce our software named Local Graph Clustering (LGC).
Human decision making underlies data generating process in multiple application areas, and models explaining and predicting choices made by individuals are in high demand. Discrete choice models are widely studied in economics and computational socia l sciences. As digital social networking facilitates information flow and spread of influence between individuals, new advances in modeling are needed to incorporate social information into these models in addition to characteristic features affecting individual choices. In this paper, we propose two novel models with scalable training algorithms: local logistics graph regularization (LLGR) and latent class graph regularization (LCGR) models. We add social regularization to represent similarity between friends, and we introduce latent classes to account for possible preference discrepancies between different social groups. Training of the LLGR model is performed using alternating direction method of multipliers (ADMM), and training of the LCGR model is performed using a specialized Monte Carlo expectation maximization (MCEM) algorithm. Scalability to large graphs is achieved by parallelizing computation in both the expectation and the maximization steps. The LCGR model is the first latent class classification model that incorporates social relationships among individuals represented by a given graph. To evaluate our two models, we consider three classes of data to illustrate a typical large-scale use case in internet and social media applications. We experiment on synthetic datasets to empirically explain when the proposed model is better than vanilla classification models that do not exploit graph structure. We also experiment on real-world data, including both small scale and large scale real-world datasets, to demonstrate on which types of datasets our model can be expected to outperform state-of-the-art models.
Locally-biased graph algorithms are algorithms that attempt to find local or small-scale structure in a large data graph. In some cases, this can be accomplished by adding some sort of locality constraint and calling a traditional graph algorithm; bu t more interesting are locally-biased graph algorithms that compute answers by running a procedure that does not even look at most of the input graph. This corresponds more closely to what practitioners from various data science domains do, but it does not correspond well with the way that algorithmic and statistical theory is typically formulated. Recent work from several research communities has focused on developing locally-biased graph algorithms that come with strong complementary algorithmic and statistical theory and that are useful in practice in downstream data science applications. We provide a review and overview of this work, highlighting commonalities between seemingly-different approaches, and highlighting promising directions for future work.
Modern graph clustering applications require the analysis of large graphs and this can be computationally expensive. In this regard, local spectral graph clustering methods aim to identify well-connected clusters around a given seed set of reference nodes without accessing the entire graph. The celebrated Approximate Personalized PageRank (APPR) algorithm in the seminal paper by Andersen et al. is one such method. APPR was introduced and motivated purely from an algorithmic perspective. In other words, there is no a priori notion of objective function/optimality conditions that characterizes the steps taken by APPR. Here, we derive a novel variational formulation which makes explicit the actual optimization problem solved by APPR. In doing so, we draw connections between the local spectral algorithm of and an iterative shrinkage-thresholding algorithm (ISTA). In particular, we show that, appropriately initialized ISTA applied to our variational formulation can recover the sought-after local cluster in a time that only depends on the number of non-zeros of the optimal solution instead of the entire graph. In the process, we show that an optimization algorithm which apparently requires accessing the entire graph, can be made to behave in a completely local manner by accessing only a small number of nodes. This viewpoint builds a bridge across two seemingly disjoint fields of graph processing and numerical optimization, and it allows one to leverage well-studied, numerically robust, and efficient optimization algorithms for processing todays large graphs.
We present and analyze a simple, two-step algorithm to approximate the optimal solution of the sparse PCA problem. Our approach first solves a L1 penalized version of the NP-hard sparse PCA optimization problem and then uses a randomized rounding str ategy to sparsify the resulting dense solution. Our main theoretical result guarantees an additive error approximation and provides a tradeoff between sparsity and accuracy. Our experimental evaluation indicates that our approach is competitive in practice, even compared to state-of-the-art toolboxes such as Spasm.
We present a novel randomized block coordinate descent method for the minimization of a convex composite objective function. The method uses (approximate) partial second-order (curvature) information, so that the algorithm performance is more robust when applied to highly nonseparable or ill conditioned problems. We call the method Flexible Coordinate Descent (FCD). At each iteration of FCD, a block of coordinates is sampled randomly, a quadratic model is formed about that block and the model is minimized emph{approximately/inexactly} to determine the search direction. An inexpensive line search is then employed to ensure a monotonic decrease in the objective function and acceptance of large step sizes. We present several high probability iteration complexity results to show that convergence of FCD is guaranteed theoretically. Finally, we present numerical results on large-scale problems to demonstrate the practical performance of the method.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا