ترغب بنشر مسار تعليمي؟ اضغط هنا

Cluster-and-Conquer: When Randomness Meets Graph Locality

209   0   0.0 ( 0 )
 نشر من قبل Olivier Ruas
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English
 تأليف George Giakkoupis




اسأل ChatGPT حول البحث

K-Nearest-Neighbors (KNN) graphs are central to many emblematic data mining and machine-learning applications. Some of the most efficient KNN graph algorithms are incremental and local: they start from a random graph, which they incrementally improve by traversing neighbors-of-neighbors links. Paradoxically, this random start is also one of the key weaknesses of these algorithms: nodes are initially connected to dissimilar neighbors, that lie far away according to the similarity metric. As a result, incremental algorithms must first laboriously explore spurious potential neighbors before they can identify similar nodes, and start converging. In this paper, we remove this drawback with Cluster-and-Conquer (C 2 for short). Cluster-and-Conquer boosts the starting configuration of greedy algorithms thanks to a novel lightweight clustering mechanism, dubbed FastRandomHash. FastRandomHash leverages random-ness and recursion to pre-cluster similar nodes at a very low cost. Our extensive evaluation on real datasets shows that Cluster-and-Conquer significantly outperforms existing approaches, including LSH, yielding speed-ups of up to x4.42 while incurring only a negligible loss in terms of KNN quality.

قيم البحث

اقرأ أيضاً

Arising user-centric graph applications such as route planning and personalized social network analysis have initiated a shift of paradigms in modern graph processing systems towards multi-query analysis, i.e., processing multiple graph queries in pa rallel on a shared graph. These applications generate a dynamic number of localized queries around query hotspots such as popular urban areas. However, existing graph processing systems are not yet tailored towards these properties: The employed methods for graph partitioning and synchronization management disregard query locality and dynamism which leads to high query latency. To this end, we propose the system Q-Graph for multi-query graph analysis that considers query locality on three levels. (i) The query-aware graph partitioning algorithm Q-cut maximizes query locality to reduce communication overhead. (ii) The method for synchronization management, called hybrid barrier synchronization, allows for full exploitation of local queries spanning only a subset of partitions. (iii) Both methods adapt at runtime to changing query workloads in order to maintain and exploit locality. Our experiments show that Q-cut reduces average query latency by up to 57 percent compared to static query-agnostic partitioning algorithms.
Services computing can offer a high-level abstraction to support diverse applications via encapsulating various computing infrastructures. Though services computing has greatly boosted the productivity of developers, it is faced with three main chall enges: privacy and security risks, information silo, and pricing mechanisms and incentives. The recent advances of blockchain bring opportunities to address the challenges of services computing due to its build-in encryption as well as digital signature schemes, decentralization feature, and intrinsic incentive mechanisms. In this paper, we present a survey to investigate the integration of blockchain with services computing. The integration of blockchain with services computing mainly exhibits merits in two aspects: i) blockchain can potentially address key challenges of services computing and ii) services computing can also promote blockchain development. In particular, we categorize the current literature of services computing based on blockchain into five types: services creation, services discovery, services recommendation, services composition, and services arbitration. Moreover, we generalize Blockchain as a Service (BaaS) architecture and summarize the representative BaaS platforms. In addition, we also outline open issues of blockchain-based services computing and BaaS.
In this paper, we make a first attempt to incorporate both commuting demand and transit network connectivity in bus route planning (CT-Bus), and formulate it as a constrained optimization problem: planning a new bus route with k edges over an existin g transit network without building new bus stops to maximize a linear aggregation of commuting demand and connectivity of the transit network. We prove the NP-hardness of CT-Bus and propose an expansion-based greedy algorithm that iteratively scans potential candidate paths in the network. To boost the efficiency of computing the connectivity of new networks with candidate paths, we convert it to a matrix trace estimation problem and employ a Lanczos method to estimate the natural connectivity of the transit network with a guaranteed error bound. Furthermore, we derive upper bounds on the objective values and use them to greedily select candidates for expansion. Our experiments conducted on real-world transit networks in New York City and Chicago verify the efficiency, effectiveness, and scalability of our algorithms.
In this paper, we study how the Pruned Landmark Labeling (PPL) algorithm can be parallelized in a scalable fashion, producing the same results as the sequential algorithm. More specifically, we parallelize using a Vertex-Centric (VC) computational mo del on a modern SIMD powered multicore architecture. We design a new VC-PLL algorithm that resolves the apparent mismatch between the inherent sequential dependence of the PLL algorithm and the Vertex- Centric (VC) computing model. Furthermore, we introduce a novel batch execution model for VC computation and the BVC-PLL algorithm to reduce the computational inefficiency in VC-PLL. Quite surprisingly, the theoretical analysis reveals that under a reasonable assumption, BVC-PLL has lower computational and memory access costs than PLL and indicates it may run faster than PLL as a sequential algorithm. We also demonstrate how BVC-PLL algorithm can be extended to handle directed graphs and weighted graphs and how it can utilize the hierarchical parallelism on a modern parallel computing architecture. Extensive experiments on real-world graphs not only show the sequential BVC-PLL can run more than two times faster than the original PLL, but also demonstrates its parallel efficiency and scalability.
According to quantum theory, the outcomes obtained by measuring an entangled state necessarily exhibit some randomness if they violate a Bell inequality. In particular, a maximal violation of the CHSH inequality guarantees that 1.23 bits of randomnes s are generated by the measurements. However, by performing measurements with binary outcomes on two subsystems one could in principle generate up to two bits of randomness. We show that correlations that violate arbitrarily little the CHSH inequality or states with arbitrarily little entanglement can be used to certify that close to the maximum of two bits of randomness are produced. Our results show that non-locality, entanglement, and the amount of randomness that can be certified in a Bell-type experiment are inequivalent quantities. From a practical point of view, they imply that device-independent quantum key distribution with optimal key generation rate is possible using almost-local correlations and that device-independent randomness generation with optimal rate is possible with almost-local correlations and with almost-unentangled states.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا