No Arabic abstract
We describe a new statistical test for pseudorandom number generators (PRNGs). Our test can find bias induced by dependencies among the Hamming weights of the outputs of a PRNG, even for PRNGs that pass state-of-the-art tests of the same kind from the literature, and in particular for generators based on F_2-linear transformations such as the dSFMT, xoroshiro128+, and WELL512.
We present a set of new instances of the maximum weight independent set problem. These instances are derived from a real-world vehicle routing problem and are challenging to solve in part because of their large size. We present instances with up to 881 thousand nodes and 383 million edges.
We consider the problem of computing a $(1+epsilon)$-approximation of the Hamming distance between a pattern of length $n$ and successive substrings of a stream. We first look at the one-way randomised communication complexity of this problem, giving Alice the first half of the stream and Bob the second half. We show the following: (1) If Alice and Bob both share the pattern then there is an $O(epsilon^{-4} log^2 n)$ bit randomised one-way communication protocol. (2) If only Alice has the pattern then there is an $O(epsilon^{-2}sqrt{n}log n)$ bit randomised one-way communication protocol. We then go on to develop small space streaming algorithms for $(1+epsilon)$-approximate Hamming distance which give worst case running time guarantees per arriving symbol. (1) For binary input alphabets there is an $O(epsilon^{-3} sqrt{n} log^{2} n)$ space and $O(epsilon^{-2} log{n})$ time streaming $(1+epsilon)$-approximate Hamming distance algorithm. (2) For general input alphabets there is an $O(epsilon^{-5} sqrt{n} log^{4} n)$ space and $O(epsilon^{-4} log^3 {n})$ time streaming $(1+epsilon)$-approximate Hamming distance algorithm.
We show tight bounds for online Hamming distance computation in the cell-probe model with word size w. The task is to output the Hamming distance between a fixed string of length n and the last n symbols of a stream. We give a lower bound of Omega((d/w)*log n) time on average per output, where d is the number of bits needed to represent an input symbol. We argue that this bound is tight within the model. The lower bound holds under randomisation and amortisation.
We revisit a fundamental problem in string matching: given a pattern of length m and a text of length n, both over an alphabet of size $sigma$, compute the Hamming distance between the pattern and the text at every location. Several $(1+epsilon)$-approximation algorithms have been proposed in the literature, with running time of the form $O(epsilon^{-O(1)}nlog nlog m)$, all using fast Fourier transform (FFT). We describe a simple $(1+epsilon)$-approximation algorithm that is faster and does not need FFT. Combining our approach with additional ideas leads to numerous new results: - We obtain the first linear-time approximation algorithm; the running time is $O(epsilon^{-2}n)$. - We obtain a faster exact algorithm computing all Hamming distances up to a given threshold k; its running time improves previous results by logarithmic factors and is linear if $klesqrt m$. - We obtain approximation algorithms with better $epsilon$-dependence using rectangular matrix multiplication. The time-bound is $~O(n)$ when the pattern is sufficiently long: $mge epsilon^{-28}$. Previous algorithms require $~O(epsilon^{-1}n)$ time. - When k is not too small, we obtain a truly sublinear-time algorithm to find all locations with Hamming distance approximately (up to a constant factor) less than k, in $O((n/k^{Omega(1)}+occ)n^{o(1)})$ time, where occ is the output size. The algorithm leads to a property tester, returning true if an exact match exists and false if the Hamming distance is more than $delta m$ at every location, running in $~O(delta^{-1/3}n^{2/3}+delta^{-1}n/m)$ time. - We obtain a streaming algorithm to report all locations with Hamming distance approximately less than k, using $~O(epsilon^{-2}sqrt k)$ space. Previously, streaming algorithms were known for the exact problem with ~O(k) space or for the approximate problem with $~O(epsilon^{-O(1)}sqrt m)$ space.
We present a massively parallel algorithm, with near-linear memory per machine, that computes a $(2+varepsilon)$-approximation of minimum-weight vertex cover in $O(loglog d)$ rounds, where $d$ is the average degree of the input graph. Our result fills the key remaining gap in the state-of-the-art MPC algorithms for vertex cover and matching problems; two classic optimization problems, which are duals of each other. Concretely, a recent line of work---by Czumaj et al. [STOC18], Ghaffari et al. [PODC18], Assadi et al. [SODA19], and Gamlath et al. [PODC19]---provides $O(loglog n)$ time algorithms for $(1+varepsilon)$-approximate maximum weight matching as well as for $(2+varepsilon)$-approximate minimum cardinality vertex cover. However, the latter algorithm does not work for the general weighted case of vertex cover, for which the best known algorithm remained at $O(log n)$ time complexity.