No Arabic abstract
The problem of matching a query string to a directed graph, whose vertices are labeled by strings, has application in different fields, from data mining to computational biology. Several variants of the problem have been considered, depending on the fact that the match is exact or approximate and, in this latter case, which edit operations are considered and where are allowed. In this paper we present results on the complexity of the approximate matching problem, where edit operations are symbol substitutions and are allowed only on the graph labels or both on the graph labels and the query string. We introduce a variant of the problem that asks whether there exists a path in a graph that represents a query string with any number of edit operations and we show that is is NP-complete, even when labels have length one and in the case the alphabet is binary. Moreover, when it is parameterized by the length of the input string and graph labels have length one, we show that the problem is fixed-parameter tractable and it is unlikely to admit a polynomial kernel. The NP-completeness of this problem leads to the inapproximability (within any factor) of the approximate matching when edit operations are allowed only on the graph labels. Moreover, we show that the variants of approximate string matching to graph we consider are not fixed-parameter tractable, when the parameter is the number of edit operations, even for graphs that have distance one from a DAG. The reduction for this latter result allows us to prove the inapproximability of the variant where edit operations can be applied both on the query string and on graph labels.
Longest Run Subsequence is a problem introduced recently in the context of the scaffolding phase of genome assembly (Schrinner et al., WABI 2020). The problem asks for a maximum length subsequence of a given string that contains at most one run for each symbol (a run is a maximum substring of consecutive identical symbols). The problem has been shown to be NP-hard and to be fixed-parameter tractable when the parameter is the size of the alphabet on which the input string is defined. In this paper we further investigate the complexity of the problem and we show that it is fixed-parameter tractable when it is parameterized by the number of runs in a solution, a smaller parameter. Moreover, we investigate the kernelization complexity of Longest Run Subsequence and we prove that it does not admit a polynomial kernel when parameterized by the size of the alphabet or by the number of runs. Finally, we consider the restriction of Longest Run Subsequence when each symbol has at most two occurrences in the input string and we show that it is APX-hard.
We consider the problems of deciding whether an input graph can be modified by removing/adding at most k vertices/edges such that the result of the modification satisfies some property definable in first-order logic. We establish a number of sufficient and necessary conditions on the quantification pattern of the first-order formula phi for the problem to be fixed-parameter tractable or to admit a polynomial kernel.
We investigate the parameterized complexity of finding subgraphs with hereditary properties on graphs belonging to a hereditary graph class. Given a graph $G$, a non-trivial hereditary property $Pi$ and an integer parameter $k$, the general problem $P(G,Pi,k)$ asks whether there exists $k$ vertices of $G$ that induce a subgraph satisfying property $Pi$. This problem, $P(G,Pi,k)$ has been proved to be NP-complete by Lewis and Yannakakis. The parameterized complexity of this problem is shown to be W[1]-complete by Khot and Raman, if $Pi$ includes all trivial graphs but not all complete graphs and vice versa; and is fixed-parameter tractable (FPT), otherwise. As the problem is W[1]-complete on general graphs when $Pi$ includes all trivial graphs but not all complete graphs and vice versa, it is natural to further investigate the problem on restricted graph classes. Motivated by this line of research, we study the problem on graphs which also belong to a hereditary graph class and establish a framework which settles the parameterized complexity of the problem for various hereditary graph classes. In particular, we show that: $P(G,Pi,k)$ is solvable in polynomial time when the graph $G$ is co-bipartite and $Pi$ is the property of being planar, bipartite or triangle-free (or vice-versa). $P(G,Pi,k)$ is FPT when the graph $G$ is planar, bipartite or triangle-free and $Pi$ is the property of being planar, bipartite or triangle-free, or graph $G$ is co-bipartite and $Pi$ is the property of being co-bipartite. $P(G,Pi,k)$ is W[1]-complete when the graph $G$ is $C_4$-free, $K_{1,4}$-free or a unit disk graph and $Pi$ is the property of being either planar or bipartite.
Understanding spatial correlation is vital in many fields including epidemiology and social science. Lee, Meeks and Pettersson (Stat. Comput. 2021) recently demonstrated that improved inference for areal unit count data can be achieved by carrying out modifications to a graph representing spatial correlations; specifically, they delete edges of the planar graph derived from border-sharing between geographic regions in order to maximise a specific objective function. In this paper we address the computational complexity of the associated graph optimisation problem. We demonstrate that this problem cannot be solved in polynomial time unless P = NP; we further show intractability for two simpler variants of the problem. We follow these results with two parameterised algorithms that exactly solve the problem in polynomial time in restricted settings. The first of these utilises dynamic programming on a tree decomposition, and runs in polynomial time if both the treewidth and maximum degree are bounded. The second algorithm is restricted to problem instances with maximum degree three, as may arise from triangulations of planar surfaces, but is an FPT algorithm when the maximum number of edges that can be removed is taken as the parameter.
Efficient computation of node proximity queries such as transition probabilities, Personalized PageRank, and Katz are of fundamental importance in various graph mining and learning tasks. In particular, several recent works leverage fast node proximity computation to improve the scalability of Graph Neural Networks (GNN). However, prior studies on proximity computation and GNN feature propagation are on a case-by-case basis, with each paper focusing on a particular proximity measure. In this paper, we propose Approximate Graph Propagation (AGP), a unified randomized algorithm that computes various proximity queries and GNN feature propagation, including transition probabilities, Personalized PageRank, heat kernel PageRank, Katz, SGC, GDC, and APPNP. Our algorithm provides a theoretical bounded error guarantee and runs in almost optimal time complexity. We conduct an extensive experimental study to demonstrate AGPs effectiveness in two concrete applications: local clustering with heat kernel PageRank and node classification with GNNs. Most notably, we present an empirical study on a billion-edge graph Papers100M, the largest publicly available GNN dataset so far. The results show that AGP can significantly improve various existing GNN models scalability without sacrificing prediction accuracy.