No Arabic abstract
Minimal-interval semantics associates with each query over a document a set of intervals, called witnesses, that are incomparable with respect to inclusion (i.e., they form an antichain): witnesses define the minimal regions of the document satisfying the query. Minimal-interval semantics makes it easy to define and compute several sophisticated proximity operators, provides snippets for user presentation, and can be used to rank documents. In this paper we provide algorithms for computing conjunction and disjunction that are linear in the number of intervals and logarithmic in the number of operands; for additional operators, such as ordered conjunction and Brouwerian difference, we provide linear algorithms. In all cases, space is linear in the number of operands. More importantly, we define a formal notion of optimal laziness, and either prove it, or prove its impossibility, for each algorithm. We cast our results in a general framework of antichains of intervals on total orders, making our algorithms directly applicable to other domains.
We consider the problem of computing the diameter of a unicycle graph (i.e., a graph with a unique cycle). We present an O(n) time algorithm for the problem, where n is the number of vertices of the graph. This improves the previous best O(n log n) time solution [Oh and Ahn, ISAAC 2016]. Using this algorithm as a subroutine, we solve the problem of adding a shortcut to a tree so that the diameter of the new graph (which is a unicycle graph) is minimized; our algorithm takes O(n^2 log n) time and O(n) space. The previous best algorithms solve the problem in O(n^2 log^3 n) time and O(n) space [Oh and Ahn, ISAAC 2016], or in O(n^2) time and O(n^2) space [Bil`o, ISAAC 2018].
Let $A$ and $B$ be two point sets in the plane of sizes $r$ and $n$ respectively (assume $r leq n$), and let $k$ be a parameter. A matching between $A$ and $B$ is a family of pairs in $A times B$ so that any point of $A cup B$ appears in at most one pair. Given two positive integers $p$ and $q$, we define the cost of matching $M$ to be $c(M) = sum_{(a, b) in M}|{a-b}|_p^q$ where $|{cdot}|_p$ is the $L_p$-norm. The geometric partial matching problem asks to find the minimum-cost size-$k$ matching between $A$ and $B$. We present efficient algorithms for geometric partial matching problem that work for any powers of $L_p$-norm matching objective: An exact algorithm that runs in $O((n + k^2) {mathop{mathrm{polylog}}} n)$ time, and a $(1 + varepsilon)$-approximation algorithm that runs in $O((n + ksqrt{k}) {mathop{mathrm{polylog}}} n cdot logvarepsilon^{-1})$ time. Both algorithms are based on the primal-dual flow augmentation scheme; the main improvements involve using dynamic data structures to achieve efficient flow augmentations. With similar techniques, we give an exact algorithm for the planar transportation problem running in $O(min{n^2, rn^{3/2}} {mathop{mathrm{polylog}}} n)$ time.
We present classical sublinear-time algorithms for solving low-rank linear systems of equations. Our algorithms are inspired by the HHL quantum algorithm for solving linear systems and the recent breakthrough by Tang of dequantizing the quantum algorithm for recommendation systems. Let $A in mathbb{C}^{m times n}$ be a rank-$k$ matrix, and $b in mathbb{C}^m$ be a vector. We present two algorithms: a sampling algorithm that provides a sample from $A^{-1}b$ and a query algorithm that outputs an estimate of an entry of $A^{-1}b$, where $A^{-1}$ denotes the Moore-Penrose pseudo-inverse. Both of our algorithms have query and time complexity $O(mathrm{poly}(k, kappa, |A|_F, 1/epsilon),mathrm{polylog}(m, n))$, where $kappa$ is the condition number of $A$ and $epsilon$ is the precision parameter. Note that the algorithms we consider are sublinear time, so they cannot write and read the whole matrix or vectors. In this paper, we assume that $A$ and $b$ come with well-known low-overhead data structures such that entries of $A$ and $b$ can be sampled according to some natural probability distributions. Alternatively, when $A$ is positive semidefinite, our algorithms can be adapted so that the sampling assumption on $b$ is not required.
We establish a polynomial-time approximation algorithm for partition functions of quantum spin models at high temperature. Our algorithm is based on the quantum cluster expansion of Netov{c}ny and Redig and the cluster expansion approach to designing algorithms due to Helmuth, Perkins, and Regts. Similar results have previously been obtained by related methods, and our main contribution is a simple and slightly sharper analysis for the case of pairwise interactions on bounded-degree graphs.
Recently, great efforts have been dedicated to researches on the management of large scale graph based data such as WWW, social networks, biological networks. In the study of graph based data management, node disjoint subgraph homeomorphism relation between graphs is more suitable than (sub)graph isomorphism in many cases, especially in those cases that node skipping and node mismatching are allowed. However, no efficient node disjoint subgraph homeomorphism determination (ndSHD) algorithms have been available. In this paper, we propose two computationally efficient ndSHD algorithms based on state spaces searching with backtracking, which employ many heuristics to prune the search spaces. Experimental results on synthetic data sets show that the proposed algorithms are efficient, require relative little time in most of the testing cases, can scale to large or dense graphs, and can accommodate to more complex fuzzy matching cases.