ترغب بنشر مسار تعليمي؟ اضغط هنا

Efficient Top-k Vulnerable Nodes Detection in Uncertain Graphs

59   0   0.0 ( 0 )
 نشر من قبل Dawei Cheng
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Uncertain graphs have been widely used to model complex linked data in many real-world applications, such as guaranteed-loan networks and power grids, where a node or edge may be associated with a probability. In these networks, a node usually has a certain chance of default or breakdown due to self-factors or the influence from upstream nodes. For regulatory authorities and companies, it is critical to efficiently identify the vulnerable nodes, i.e., nodes with high default risks, such that they could pay more attention to these nodes for the purpose of risk management. In this paper, we propose and investigate the problem of top-$k$ vulnerable nodes detection in uncertain graphs. We formally define the problem and prove its hardness. To identify the $k$ most vulnerable nodes, a sampling-based approach is proposed. Rigorous theoretical analysis is conducted to bound the quality of returned results. Novel optimization techniques and a bottom-$k$ sketch based approach are further developed in order to scale for large networks. In the experiments, we demonstrate the performance of proposed techniques on 3 real financial networks and 5 benchmark networks. The evaluation results show that the proposed methods can achieve up to 2 orders of magnitudes speedup compared with the baseline approach. Moreover, to further verify the advantages of our model in real-life scenarios, we integrate the proposed techniques with our current loan risk control system, which is deployed in the collaborated bank, for more evaluation. Particularly, we show that our proposed new model has superior performance on real-life guaranteed-loan network data, which can better predict the default risks of enterprises compared to the state-of-the-art techniques.



قيم البحث

اقرأ أيضاً

Network reliability is an important metric to evaluate the connectivity among given vertices in uncertain graphs. Since the network reliability problem is known as #P-complete, existing studies have used approximation techniques. In this paper, we pr opose a new sampling-based approach that efficiently and accurately approximates network reliability. Our approach improves efficiency by reducing the number of samples based on stratified sampling. We theoretically guarantee that our approach improves the accuracy of approximation by using lower and upper bounds of network reliability, even though it reduces the number of samples. To efficiently compute the bounds, we develop an extended BDD, called S2BDD. During constructing the S2BDD, our approach employs dynamic programming for efficiently sampling possible graphs. Our experiment with real datasets demonstrates that our approach is up to 51.2 times faster than the existing sampling-based approach with higher accuracy.
Answering complex questions over knowledge bases (KB-QA) faces huge input data with billions of facts, involving millions of entities and thousands of predicates. For efficiency, QA systems first reduce the answer search space by identifying a set of facts that is likely to contain all answers and relevant cues. The most common technique is to apply named entity disambiguation (NED) systems to the question, and retrieve KB facts for the disambiguated entities. This work presents ECQA, an efficient method that prunes irrelevant parts of the search space using KB-aware signals. ECQA is based on top-k query processing over score-ordered lists of KB items that combine signals about lexical matching, relevance to the question, coherence among candidate items, and connectivity in the KB graph. Experiments with two recent QA benchmarks demonstrate the superiority of ECQA over state-of-the-art baselines with respect to answer presence, size of the search space, and runtimes.
Biological functions are carried out by groups of interacting molecules, cells or tissues, known as communities. Membership in these communities may overlap when biological components are involved in multiple functions. However, traditional clusterin g methods detect non-overlapping communities. These detected communities may also be unstable and difficult to replicate, because traditional methods are sensitive to noise and parameter settings. These aspects of traditional clustering methods limit our ability to detect biological communities, and therefore our ability to understand biological functions. To address these limitations and detect robust overlapping biological communities, we propose an unorthodox clustering method called SpeakEasy which identifies communities using top-down and bottom-up approaches simultaneously. Specifically, nodes join communities based on their local connections, as well as global information about the network structure. This method can quantify the stability of each community, automatically identify the number of communities, and quickly cluster networks with hundreds of thousands of nodes. SpeakEasy shows top performance on synthetic clustering benchmarks and accurately identifies meaningful biological communities in a range of datasets, including: gene microarrays, protein interactions, sorted cell populations, electrophysiology and fMRI brain imaging.
Betweenness centrality, measured by the number of times a vertex occurs on all shortest paths of a graph, has been recognized as a key indicator for the importance of a vertex in the network. However, the betweenness of a vertex is often very hard to compute because it needs to explore all the shortest paths between the other vertices. Recently, a relaxed concept called ego-betweenness was introduced which focuses on computing the betweenness of a vertex in its ego network. In this work, we study a problem of finding the top-k vertices with the highest ego-betweennesses. We first develop two novel search algorithms equipped with a basic upper bound and a dynamic upper bound to efficiently solve this problem. Then, we propose local-update and lazy-update solutions to maintain the ego-betweennesses for all vertices and the top-k results when the graph is updated, respectively. In addition, we also present two efficient parallel algorithms to further improve the efficiency. The results of extensive experiments on five large real-life datasets demonstrate the efficiency, scalability, and effectiveness of our algorithms.
Following the success of dot-product attention in Transformers, numerous approximations have been recently proposed to address its quadratic complexity with respect to the input length. While these variants are memory and compute efficient, it is not possible to directly use them with popular pre-trained language models trained using vanilla attention, without an expensive corrective pre-training stage. In this work, we propose a simple yet highly accurate approximation for vanilla attention. We process the queries in chunks, and for each query, compute the top-$k$ scores with respect to the keys. Our approach offers several advantages: (a) its memory usage is linear in the input size, similar to linear attention variants, such as Performer and RFA (b) it is a drop-in replacement for vanilla attention that does not require any corrective pre-training, and (c) it can also lead to significant memory savings in the feed-forward layers after casting them into the familiar query-key-value framework. We evaluate the quality of top-$k$ approximation for multi-head attention layers on the Long Range Arena Benchmark, and for feed-forward layers of T5 and UnifiedQA on multiple QA datasets. We show our approach leads to accuracy that is nearly-identical to vanilla attention in multiple setups including training from scratch, fine-tuning, and zero-shot inference.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا