ﻻ يوجد ملخص باللغة العربية
Uncertain graphs have been widely used to model complex linked data in many real-world applications, such as guaranteed-loan networks and power grids, where a node or edge may be associated with a probability. In these networks, a node usually has a certain chance of default or breakdown due to self-factors or the influence from upstream nodes. For regulatory authorities and companies, it is critical to efficiently identify the vulnerable nodes, i.e., nodes with high default risks, such that they could pay more attention to these nodes for the purpose of risk management. In this paper, we propose and investigate the problem of top-$k$ vulnerable nodes detection in uncertain graphs. We formally define the problem and prove its hardness. To identify the $k$ most vulnerable nodes, a sampling-based approach is proposed. Rigorous theoretical analysis is conducted to bound the quality of returned results. Novel optimization techniques and a bottom-$k$ sketch based approach are further developed in order to scale for large networks. In the experiments, we demonstrate the performance of proposed techniques on 3 real financial networks and 5 benchmark networks. The evaluation results show that the proposed methods can achieve up to 2 orders of magnitudes speedup compared with the baseline approach. Moreover, to further verify the advantages of our model in real-life scenarios, we integrate the proposed techniques with our current loan risk control system, which is deployed in the collaborated bank, for more evaluation. Particularly, we show that our proposed new model has superior performance on real-life guaranteed-loan network data, which can better predict the default risks of enterprises compared to the state-of-the-art techniques.
Network reliability is an important metric to evaluate the connectivity among given vertices in uncertain graphs. Since the network reliability problem is known as #P-complete, existing studies have used approximation techniques. In this paper, we pr
Answering complex questions over knowledge bases (KB-QA) faces huge input data with billions of facts, involving millions of entities and thousands of predicates. For efficiency, QA systems first reduce the answer search space by identifying a set of
Biological functions are carried out by groups of interacting molecules, cells or tissues, known as communities. Membership in these communities may overlap when biological components are involved in multiple functions. However, traditional clusterin
Betweenness centrality, measured by the number of times a vertex occurs on all shortest paths of a graph, has been recognized as a key indicator for the importance of a vertex in the network. However, the betweenness of a vertex is often very hard to
Following the success of dot-product attention in Transformers, numerous approximations have been recently proposed to address its quadratic complexity with respect to the input length. While these variants are memory and compute efficient, it is not