No Arabic abstract
Reachability query is a fundamental problem on graphs, which has been extensively studied in academia and industry. Since graphs are subject to frequent updates in many applications, it is essential to support efficient graph updates while offering good performance in reachability queries. Existing solutions compress the original graph with the Directed Acyclic Graph (DAG) and propose efficient query processing and index update techniques. However, they focus on optimizing the scenarios where the Strong Connected Components(SCCs) remain unchanged and have overlooked the prohibitively high cost of the DAG maintenance when SCCs are updated. In this paper, we propose DBL, an efficient DAG-free index to support the reachability query on dynamic graphs with insertion-only updates. DBL builds on two complementary indexes: Dynamic Landmark (DL) label and Bidirectional Leaf (BL) label. The former leverages landmark nodes to quickly determine reachable pairs whereas the latter prunes unreachable pairs by indexing the leaf nodes in the graph. We evaluate DBL against the state-of-the-art approaches on dynamic reachability index with extensive experiments on real-world datasets. The results have demonstrated that DBL achieves orders of magnitude speedup in terms of index update, while still producing competitive query efficiency.
In the real world a graph is often fragmented and distributed across different sites. This highlights the need for evaluating queries on distributed graphs. This paper proposes distributed evaluation algorithms for three classes of queries: reachability for determining whether one node can reach another, bounded reachability for deciding whether there exists a path of a bounded length between a pair of nodes, and regular reachability for checking whether there exists a path connecting two nodes such that the node labels on the path form a string in a given regular expression. We develop these algorithms based on partial evaluation, to explore parallel computation. When evaluating a query Q on a distributed graph G, we show that these algorithms possess the following performance guarantees, no matter how G is fragmented and distributed: (1) each site is visited only once; (2) the total network traffic is determined by the size of Q and the fragmentation of G, independent of the size of G; and (3) the response time is decided by the largest fragment of G rather than the entire G. In addition, we show that these algorithms can be readily implemented in the MapReduce framework. Using synthetic and real-life data, we experimentally verify that these algorithms are scalable on large graphs, regardless of how the graphs are distributed.
A temporal graph is a graph in which vertices communicate with each other at specific time, e.g., $A$ calls $B$ at 11 a.m. and talks for 7 minutes, which is modeled by an edge from $A$ to $B$ with starting time 11 a.m. and duration 7 mins. Temporal graphs can be used to model many networks with time-related activities, but efficient algorithms for analyzing temporal graphs are severely inadequate. We study fundamental problems such as answering reachability and time-based path queries in a temporal graph, and propose an efficient indexing technique specifically designed for processing these queries in a temporal graph. Our results show that our method is efficient and scalable in both index construction and query processing.
Graphs are widely used to model data in many application domains. Thanks to the wide spread use of GPS-enabled devices, many applications assign a spatial attribute to graph vertices (e.g., geo-tagged social media). Users may issue a Reachability Query with Spatial Range Predicate (abbr. RangeReach). RangeReach finds whether an input vertex can reach any spatial vertex that lies within an input spatial range. An example of a RangeReach query is: Given a social graph, find whether Alice can reach any of the venues located within the geographical area of Arizona State University. The paper proposes GeoReach an approach that adds spatial data awareness to a graph database management system (GDBMS). GeoReach allows efficient execution of RangeReach queries, yet without compromising a lot on the overall system scalability (measured in terms of storage size and initialization/maintenance time). To achieve that, GeoReach is equipped with a light-weight data structure, namely SPA-Graph, that augments the underlying graph data with spatial indexing directories. When a RangeReach query is issued, the system employs a pruned-graph traversal approach. Experiments based on real system implementation inside Neo4j proves that GEOREACH exhibits up to two orders of magnitude better query response time and up to four times less storage than the state-of-the-art spatial and reachability indexing approaches.
Large-scale graph-structured data arising from social networks, databases, knowledge bases, web graphs, etc. is now available for analysis and mining. Graph-mining often involves relationship queries, which seek a ranked list of interesting interconnections among a given set of entities, corresponding to nodes in the graph. While relationship queries have been studied for many years, using various terminologies, e.g., keyword-search, Steiner-tree in a graph etc., the solutions proposed in the literature so far have not focused on scaling relationship queries to large graphs having billions of nodes and edges, such are now publicly available in the form of linked-open-data. In this paper, we present an algorithm for distributed keyword search (DKS) on large graphs, based on the graph-parallel computing paradigm Pregel. We also present an analytical proof that our algorithm produces an optimally ranked list of answers if run to completion. Even if terminated early, our algorithm produces approximate answers along with bounds. We describe an optimized implementation of our DKS algorithm along with time-complexity analysis. Finally, we report and analyze experiments using an implementation of DKS on Giraph the graph-parallel computing framework based on Pregel, and demonstrate that we can efficiently process relationship queries on large-scale subsets of linked-open-data.
Traditional route planning and $k$ nearest neighbors queries only consider distance or travel time and ignore road safety altogether. However, many travellers prefer to avoid risky or unpleasant road conditions such as roads with high crime rates (e.g., robberies, kidnapping, riots etc.) and bumpy roads. To facilitate safe travel, we introduce a novel query for road networks called the $k$ safest nearby neighbors ($k$SNN) query. Given a query location $v_l$, a distance constraint $d_c$ and a point of interest $p_i$, we define the safest path from $v_l$ to $p_i$ as the path with the highest path safety score among all the paths from $v_l$ to $p_i$ with length less than $d_c$. The path safety score is computed considering the road safety of each road segment on the path. Given a query location $v_l$, a distance constraint $d_c$ and a set of POIs $P$, a $k$SNN query returns $k$ POIs with the $k$ highest path safety scores in $P$ along with their respective safest paths from the query location. We develop two novel indexing structures called $Ct$-tree and a safety score based Voronoi diagram (SNVD). We propose two efficient query processing algorithms each exploiting one of the proposed indexes to effectively refine the search space using the properties of the index. Our extensive experimental study on real datasets demonstrates that our solution is on average an order of magnitude faster than the baselines.