Subscribe to the gold package and get unlimited access to Shamra Academy

Internal Pattern Matching Queries in a Text and Applications

361 0 0.0 ( 0 )

Download Cite

Added by Tomasz Kociumaka

Publication date 2013

fields Informatics Engineering

and research's language is English

Authors Tomasz Kociumaka - Jakub Radoszewski - Wojciech Rytter

Data Structures and Algorithms

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We consider several types of internal queries: questions about subwords of a text. As the main tool we develop an optimal data structure for the problem called here internal pattern matching. This data structure provides constant-time answers to queries about occurrences of one subword $x$ in another subword $y$ of a given text, assuming that $|y|=mathcal{O}(|x|)$, which allows for a constant-space representation of all occurrences. This problem can be viewed as a natural extension of the well-studied pattern matching problem. The data structure has linear size and admits a linear-time construction algorithm. Using the solution to the internal pattern matching problem, we obtain very efficient data structures answering queries about: primitivity of subwords, periods of subwords, general substring compression, and cyclic equivalence of two subwords. All these results improve upon the best previously known counterparts. The linear construction time of our data structure also allows to improve the algorithm for finding $delta$-subrepetitions in a text (a more general version of maximal repetitions, also called runs). For any fixed $delta$ we obtain the first linear-time algorithm, which matches the linear time complexity of the algorithm computing runs. Our data structure has already been used as a part of the efficient solutions for subword suffix rank & selection, as well as substring compression using Burrows-Wheeler transform composed with run-length encoding.

rate research

Pattern Matching in Multiple Streams

500 - Raphael Clifford , Markus Jalsenius , Ely Porat 2012

We investigate the problem of deterministic pattern matching in multiple streams. In this model, one symbol arrives at a time and is associated with one of s streaming texts. The task at each time step is to report if there is a new match between a fixed pattern of length m and a newly updated stream. As is usual in the streaming context, the goal is to use as little space as possible while still reporting matches quickly. We give almost matching upper and lower space bounds for three distinct pattern matching problems. For exact matching we show that the problem can be solved in constant time per arriving symbol and O(m+s) words of space. For the k-mismatch and k-difference problems we give O(k) time solutions that require O(m+ks) words of space. In all three cases we also give space lower bounds which show our methods are optimal up to a single logarithmic factor. Finally we set out a number of open problems related to this new model for pattern matching.

Data Structures and Algorithms

Internal Shortest Absent Word Queries in Constant Time and Linear Space

134 - Golnaz Badkobeh , Panagiotis Charalampopoulos , Dmitry Kosolobov 2021

Given a string $T$ of length $n$ over an alphabet $Sigmasubset {1,2,ldots,n^{O(1)}}$ of size $sigma$, we are to preprocess $T$ so that given a range $[i,j]$, we can return a representation of a shortest string over $Sigma$ that is absent in the fragment $T[i]cdots T[j]$ of $T$. We present an $O(n)$-space data structure that answers such queries in constant time and can be constructed in $O(nlog_sigma n)$ time.

Data Structures and Algorithms

Pattern Matching under Polynomial Transformation

704 - Ayelet Butman , Peter Clifford , Raphael Clifford 2011

We consider a class of pattern matching problems where a normalising transformation is applied at every alignment. Normalised pattern matching plays a key role in fields as diverse as image processing and musical information processing where application specific transformations are often applied to the input. By considering the class of polynomial transformations of the input, we provide fast algorithms and the first lower bounds for both new and old problems. Given a pattern of length m and a longer text of length n where both are assumed to contain integer values only, we first show O(n log m) time algorithms for pattern matching under linear transformations even when wildcard symbols can occur in the input. We then show how to extend the technique to polynomial transformations of arbitrary degree. Next we consider the problem of finding the minimum Hamming distance under polynomial transformation. We show that, for any epsilon>0, there cannot exist an O(n m^(1-epsilon)) time algorithm for additive and linear transformations conditional on the hardness of the classic 3SUM problem. Finally, we consider a version of the Hamming distance problem under additive transformations with a bound k on the maximum distance that need be reported. We give a deterministic O(nk log k) time solution which we then improve by careful use of randomisation to O(n sqrt(k log k) log n) time for sufficiently small k. Our randomised solution outputs the correct answer at every position with high probability.

Data Structures and Algorithms

Stochastic Matching with Few Queries: New Algorithms and Tools

67 - Soheil Behnezhad , Alireza Farhadi , MohammadTaghi Hajiaghayi andn Nima Reyhani 2018

We consider the following stochastic matching problem on both weighted and unweighted graphs: A graph $G(V, E)$ along with a parameter $p in (0, 1)$ is given in the input. Each edge of $G$ is realized independently with probability $p$. The goal is to select a degree bounded (dependent only on $p$) subgraph $H$ of $G$ such that the expected size/weight of maximum realized matching of $H$ is close to that of $G$. This model of stochastic matching has attracted significant attention over the recent years due to its various applications. The most fundamental open question is the best approximation factor achievable for such algorithms that, in the literature, are referred to as non-adaptive algorithms. Prior work has identified breaking (near) half-approximation as a barrier for both weighted and unweighted graphs. Our main results are as follows: -- We analyze a simple and clean algorithm and show that for unweighted graphs, it finds an (almost) $4sqrt{2}-5$ ($approx 0.6568$) approximation by querying $O(frac{log (1/p)}{p})$ edges per vertex. This improves over the state-of-the-art $0.5001$ approximate algorithm of Assadi et al. [EC17]. -- We show that the same algorithm achieves a $0.501$ approximation for weighted graphs by querying $O(frac{log (1/p)}{p})$ edges per vertex. This is the first algorithm to break $0.5$ approximation barrier for weighted graphs. It also improves the per-vertex queries of the state-of-the-art by Yamaguchi and Maehara [SODA18] and Behnezhad and Reyhani [EC18]. Our algorithms are fundamentally different from prior works, yet are very simple and natural. For the analysis, we introduce a number of procedures that construct heavy fractional matchings. We consider the new algorithms and our analytical tools to be the main contributions of this paper.

Data Structures and Algorithms

Optimal Space and Time for Streaming Pattern Matching

140 - Tung Mai , Anup Rao , Ryan A. Rossi 2021

In this work, we study longest common substring, pattern matching, and wildcard pattern matching in the asymmetric streaming model. In this streaming model, we have random access to one string and streaming access to the other one. We present streaming algorithms with provable guarantees for these three fundamental problems. In particular, our algorithms for pattern matching improve the upper bound and beat the unconditional lower bounds on the memory of randomized and deterministic streaming algorithms. In addition to this, we present algorithms for wildcard pattern matching in the asymmetric streaming model that have optimal space and time.

Data Structures and Algorithms

comments

Fetching comments

University of Mosul

Additional details More universities

Internal Pattern Matching Queries in a Text and Applications

Ask ChatGPT about the research

No Arabic abstract

Read More