Dictionary matching in a stream

261 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Allyx Fontaine

تاريخ النشر 2015

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Raphael Clifford - Allyx Fontaine - Ely Porat

بنى وهياكل البيانات والخوارزميات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We consider the problem of dictionary matching in a stream. Given a set of strings, known as a dictionary, and a stream of characters arriving one at a time, the task is to report each time some string in our dictionary occurs in the stream. We present a randomised algorithm which takes O(log log(k + m)) time per arriving character and uses O(k log m) words of space, where k is the number of strings in the dictionary and m is the length of the longest string in the dictionary.

قيم البحث

110 - Moran Feldman , Ariel Szarf 2021

The problem of finding a maximum size matching in a graph (known as the maximum matching problem) is one of the most classical problems in computer science. Despite a significant body of work dedicated to the study of this problem in the data stream model, the state-of-the-art single-pass semi-streaming algorithm for it is still a simple greedy algorithm that computes a maximal matching, and this way obtains 1/2-approximation. Some previous works described two/three-pass algorithms that improve over this approximation ratio by using their second and third passes to improve the above mentioned maximal matching. One contribution of this paper continuous this line of work by presenting new three-pass semi-streaming algorithms that work along these lines and obtain improved approximation ratios of 0.6111 and 0.5694 for triangle-free and general graphs, respectively. Unfortunately, a recent work (Konrad and Naidu, 2021) shows that the strategy of constructing a maximal matching in the first pass and then improving it in further passes has limitations. Additionally, this technique is unlikely to get us closer to single-pass semi-streaming algorithms obtaining a better than 1/2-approximation. Therefore, it is interesting to come up with algorithms that do something else with their first pass (we term such algorithms non-maximal-matching-first algorithms). No such algorithms are currently known (to the best of our knowledge), and the main contribution of this paper is describing such algorithms that obtain approximation ratios of 0.5384 and 0.5555 in two and three passes, respectively, for general graphs (the result for three passes improves over the previous state-of-the-art, but is worse than the result of this paper mentioned in the previous paragraph for general graphs).

بنى وهياكل البيانات والخوارزميات

Approximate Hamming distance in a stream

82 - Raphael Clifford , Tatiana Starikovskaya 2016

We consider the problem of computing a $(1+epsilon)$-approximation of the Hamming distance between a pattern of length $n$ and successive substrings of a stream. We first look at the one-way randomised communication complexity of this problem, giving Alice the first half of the stream and Bob the second half. We show the following: (1) If Alice and Bob both share the pattern then there is an $O(epsilon^{-4} log^2 n)$ bit randomised one-way communication protocol. (2) If only Alice has the pattern then there is an $O(epsilon^{-2}sqrt{n}log n)$ bit randomised one-way communication protocol. We then go on to develop small space streaming algorithms for $(1+epsilon)$-approximate Hamming distance which give worst case running time guarantees per arriving symbol. (1) For binary input alphabets there is an $O(epsilon^{-3} sqrt{n} log^{2} n)$ space and $O(epsilon^{-2} log{n})$ time streaming $(1+epsilon)$-approximate Hamming distance algorithm. (2) For general input alphabets there is an $O(epsilon^{-5} sqrt{n} log^{4} n)$ space and $O(epsilon^{-4} log^3 {n})$ time streaming $(1+epsilon)$-approximate Hamming distance algorithm.

بنى وهياكل البيانات والخوارزميات

Online Matching in a Ride-Sharing Platform

54 - Chinmoy Dutta , Chris Sholley 2018

We propose a formal graph-theoretic model for studying the problem of matching rides online in a ride-sharing platform. Unlike most of the literature on online matching, our model, that we call {em Online Windowed Non-Bipartite Matching} ($mbox{OWNBM }$), pertains to online matching in {em non-bipartite} graphs. We show that the edge-weighted and vertex-weight

بنى وهياكل البيانات والخوارزميات

Internal Pattern Matching Queries in a Text and Applications

124 - Tomasz Kociumaka , Jakub Radoszewski , Wojciech Rytter 2013

We consider several types of internal queries: questions about subwords of a text. As the main tool we develop an optimal data structure for the problem called here internal pattern matching. This data structure provides constant-time answers to quer ies about occurrences of one subword $x$ in another subword $y$ of a given text, assuming that $|y|=mathcal{O}(|x|)$, which allows for a constant-space representation of all occurrences. This problem can be viewed as a natural extension of the well-studied pattern matching problem. The data structure has linear size and admits a linear-time construction algorithm. Using the solution to the internal pattern matching problem, we obtain very efficient data structures answering queries about: primitivity of subwords, periods of subwords, general substring compression, and cyclic equivalence of two subwords. All these results improve upon the best previously known counterparts. The linear construction time of our data structure also allows to improve the algorithm for finding $delta$-subrepetitions in a text (a more general version of maximal repetitions, also called runs). For any fixed $delta$ we obtain the first linear-time algorithm, which matches the linear time complexity of the algorithm computing runs. Our data structure has already been used as a part of the efficient solutions for subword suffix rank & selection, as well as substring compression using Burrows-Wheeler transform composed with run-length encoding.

بنى وهياكل البيانات والخوارزميات

Algorithms for Stable Matching and Clustering in a Grid

63 - David Eppstein n University of California 2017

We study a discrete version of a geometric stable marriage problem originally proposed in a continuous setting by Hoffman, Holroyd, and Peres, in which points in the plane are stably matched to cluster centers, as prioritized by their distances, so t hat each cluster center is apportioned a set of points of equal area. We show that, for a discretization of the problem to an $ntimes n$ grid of pixels with $k$ centers, the problem can be solved in time $O(n^2 log^5 n)$, and we experiment with two slower but more practical algorithms and a hybrid method that switches from one of these algorithms to the other to gain greater efficiency than either algorithm alone. We also show how to combine geometric stable matchings with a $k$-means clustering algorithm, so as to provide a geometric political-districting algorithm that views distance in economic terms, and we experiment with weight

بنى وهياكل البيانات والخوارزميات

سجل دخول لتتمكن من نشر تعليقات