No Arabic abstract
To interpolate a supersparse polynomial with integer coefficients, two alternative approaches are the Prony-based big prime technique, which acts over a single large finite field, or the more recently-proposed small primes technique, which reduces the unknown sparse polynomial to many low-degree dense polynomials. While the latter technique has not yet reached the same theoretical efficiency as Prony-based methods, it has an obvious potential for parallelization. We present a heuristic small primes interpolation algorithm and report on a low-level C implementation using FLINT and MPI.
Given a straight-line program whose output is a polynomial function of the inputs, we present a new algorithm to compute a concise representation of that unknown function. Our algorithm can handle any case where the unknown function is a multivariate polynomial, with coefficients in an arbitrary finite field, and with a reasonable number of nonzero terms but possibly very large degree. It is competitive with previously known sparse interpolation algorithms that work over an arbitrary finite field, and provides an improvement when there are a large number of variables.
We propose efficient parallel algorithms and implementations on shared memory architectures of LU factorization over a finite field. Compared to the corresponding numerical routines, we have identified three main difficulties specific to linear algebra over finite fields. First, the arithmetic complexity could be dominated by modular reductions. Therefore, it is mandatory to delay as much as possible these reductions while mixing fine-grain parallelizations of tiled iterative and recursive algorithms. Second, fast linear algebra variants, e.g., using Strassen-Winograd algorithm, never suffer from instability and can thus be widely used in cascade with the classical algorithms. There, trade-offs are to be made between size of blocks well suited to those fast variants or to load and communication balancing. Third, many applications over finite fields require the rank profile of the matrix (quite often rank deficient) rather than the solution to a linear system. It is thus important to design parallel algorithms that preserve and compute this rank profile. Moreover, as the rank profile is only discovered during the algorithm, block size has then to be dynamic. We propose and compare several block decomposition: tile iterative with left-looking, right-looking and Crout variants, slab and tile recursive. Experiments demonstrate that the tile recursive variant performs better and matches the performance of reference numerical software when no rank deficiency occur. Furthermore, even in the most heterogeneous case, namely when all pivot blocks are rank deficient, we show that it is possbile to maintain a high efficiency.
We present sparse interpolation algorithms for recovering a polynomial with $le B$ terms from $N$ evaluations at distinct values for the variable when $le E$ of the evaluations can be erroneous. Our algorithms perform exact arithmetic in the field of scalars $mathsf{K}$ and the terms can be standard powers of the variable or Chebyshev polynomials, in which case the characteristic of $mathsf{K}$ is $ e 2$. Our algorithms return a list of valid sparse interpolants for the $N$ support points and run in polynomial-time. For standard power basis our algorithms sample at $N = lfloor frac{4}{3} E + 2 rfloor B$ points, which are fewer points than $N = 2(E+1)B - 1$ given by Kaltofen and Pernet in 2014. For Chebyshev basis our algorithms sample at $N = lfloor frac{3}{2} E + 2 rfloor B$ points, which are also fewer than the number of points required by the algorithm given by Arnold and Kaltofen in 2015, which has $N = 74 lfloor frac{E}{13} + 1 rfloor$ for $B = 3$ and $E ge 222$. Our method shows how to correct $2$ errors in a block of $4B$ points for standard basis and how to correct $1$ error in a block of $3B$ points for Chebyshev Basis.
After an introduction to the sequential version of FORM and the mechanisms behind, we report on the status of our project of parallelization. We have now a parallel version of FORM running on Cluster- and SMP-architectures. This version can be used to run arbitrary FORM programs in parallel.
This work is a comprehensive extension of Abu-Salem et al. (2015) that investigates the prowess of the Funnel Heap for implementing sums of products in the polytope method for factoring polynomials, when the polynomials are in sparse distributed representation. We exploit that the work and cache complexity of an Insert operation using Funnel Heap can be refined to de- pend on the rank of the inserted monomial product, where rank corresponds to its lifetime in Funnel Heap. By optimising on the pattern by which insertions and extractions occur during the Hensel lifting phase of the polytope method, we are able to obtain an adaptive Funnel Heap that minimises all of the work, cache, and space complexity of this phase. Additionally, we conduct a detailed empirical study confirming the superiority of Funnel Heap over the generic Binary Heap once swaps to external memory begin to take place. We demonstrate that Funnel Heap is a more efficient merger than the cache oblivious k-merger, which fails to achieve its optimal (and amortised) cache complexity when used for performing sums of products. This provides an empirical proof of concept that the overlapping approach for perform- ing sums of products using one global Funnel Heap is more suited than the serialised approach, even when the latter uses the best merging structures available.