No Arabic abstract
xorshift* generators are a variant of Marsaglias xorshift generators that eliminate linear artifacts typical of generators based on $mathbf Z/2mathbf Z$-linear operations using multiplication by a suitable constant. Shortly after high-dimensional xorshift* generators were introduced, Saito and Matsumoto suggested a different way to eliminate linear artifacts based on addition in $mathbf Z/2^{32}mathbf Z$, leading to the XSadd generator. Starting from the observation that the lower bits of XSadd are very weak, as its reverse fails systematically several statistical tests, we explore xorshift+, a variant of XSadd using 64-bit operations, which leads, in small dimension, to extremely fast high-quality generators.
Marsaglia proposed recently xorshift generators as a class of very fast, good-quality pseudorandom number generators. Subsequent analysis by Panneton and LEcuyer has lowered the expectations raised by Marsaglias paper, showing several weaknesses of such generators, verified experimentally using the TestU01 suite. Nonetheless, many of the weaknesses of xorshift generators fade away if their result is scrambled by a non-linear operation (as originally suggested by Marsaglia). In this paper we explore the space of possible generators obtained by multiplying the result of a xorshift generator by a suitable constant. We sample generators at 100 equispaced points of their state space and obtain detailed statistics that lead us to choices of parameters that improve on the current ones. We then explore for the first time the space of high-dimensional xorshift generators, following another suggestion in Marsaglias paper, finding choices of parameters providing periods of length $2^{1024} - 1$ and $2^{4096} - 1$. The resulting generators are of extremely high quality, faster than current similar alternatives, and generate long-period sequences passing strong statistical tests using only eight logical operations, one addition and one multiplication by a constant.
Linear pseudorandom number generators are very popular due to their high speed, to the ease with which generators with a sizable state space can be created, and to their provable theoretical properties. However, they suffer from linear artifacts which show as failures in linearity-related statistical tests such as the binary-rank and the linear-complexity test. In this paper, we give three new contributions. First, we introduce two new linear transformations that have been handcrafted to have good statistical properties and at the same time to be programmable very efficiently on superscalar processors, or even directly in hardware. Then, we describe a new test for Hamming-weight dependencies that is able to discover subtle, previously unknown biases in existing generators (in particular, in linear ones). Finally, we describe a number of scramblers, that is, nonlinear functions applied to the state array that reduce or delete the linear artifacts, and propose combinations of linear transformations and scramblers that give extremely fast pseudorandom generators of high quality. A novelty in our approach is that we use ideas from the theory of filtered linear-feedback shift register to prove some properties of our scramblers, rather than relying purely on heuristics. In the end, we provide simple, extremely fast generators that use a few hundred bits of memory, have provable properties and pass very strong statistical tests.
Future architectures designed to deliver exascale performance motivate the need for novel algorithmic changes in order to fully exploit their capabilities. In this paper, the performance of several numerical algorithms, characterised by varying degrees of memory and computational intensity, are evaluated in the context of finite difference methods for fluid dynamics problems. It is shown that, by storing some of the evaluated derivatives as single thread- or process-local variables in memory, or recomputing the derivatives on-the-fly, a speed-up of ~2 can be obtained compared to traditional algorithms that store all derivatives in global arrays.
We present a number of new results about range searching for colored (or categorical) data: 1. For a set of $n$ colored points in three dimensions, we describe randomized data structures with $O(nmathop{rm polylog}n)$ space that can report the distinct colors in any query orthogonal range (axis-aligned box) in $O(kmathop{rm polyloglog} n)$ expected time, where $k$ is the number of distinct colors in the range, assuming that coordinates are in ${1,ldots,n}$. Previous data structures require $O(frac{log n}{loglog n} + k)$ query time. Our result also implies improvements in higher constant dimensions. 2. Our data structures can be adapted to halfspace ranges in three dimensions (or circular ranges in two dimensions), achieving $O(klog n)$ expected query time. Previous data structures require $O(klog^2n)$ query time. 3. For a set of $n$ colored points in two dimensions, we describe a data structure with $O(nmathop{rm polylog}n)$ space that can answer colored type-2 range counting queries: report the number of occurrences of every distinct color in a query orthogonal range. The query time is $O(frac{log n}{loglog n} + kloglog n)$, where $k$ is the number of distinct colors in the range. Naively performing $k$ uncolored range counting queries would require $O(kfrac{log n}{loglog n})$ time. Our data structures are designed using a variety of techniques, including colored variants of randomized incremental construction (which may be of independent interest), colored variants of shallow cuttings, and bit-packing tricks.
Longest Run Subsequence is a problem introduced recently in the context of the scaffolding phase of genome assembly (Schrinner et al., WABI 2020). The problem asks for a maximum length subsequence of a given string that contains at most one run for each symbol (a run is a maximum substring of consecutive identical symbols). The problem has been shown to be NP-hard and to be fixed-parameter tractable when the parameter is the size of the alphabet on which the input string is defined. In this paper we further investigate the complexity of the problem and we show that it is fixed-parameter tractable when it is parameterized by the number of runs in a solution, a smaller parameter. Moreover, we investigate the kernelization complexity of Longest Run Subsequence and we prove that it does not admit a polynomial kernel when parameterized by the size of the alphabet or by the number of runs. Finally, we consider the restriction of Longest Run Subsequence when each symbol has at most two occurrences in the input string and we show that it is APX-hard.