No Arabic abstract
We present an optimized algorithm calculating determinant for multivariate polynomial matrix on GPU. The novel algorithm provides precise determinant for input multivariate polynomial matrix in controllable time. Our approach is based on modular methods and split into Fast Fourier Transformation, Condensation method and Chinese Remainder Theorem where each algorithm is paralleled on GPU. The experiment results show that our parallel method owns substantial speedups compared to Maple, allowing memory overhead and time expedition in steady increment. We are also able to deal with complex matrix which is over the threshold on Maple and constrained on CPU. In addition, calculation during the process could be recovered without losing accuracy at any point regardless of disruptions. Furthermore, we propose a time prediction for calculation of polynomial determinant according to some basic matrix attributes and we solve an open problem relating to harmonic elimination equations on the basis of our GPU implementation.
In this paper, we study how to quickly compute the <-minimal monomial interpolating basis for a multivariate polynomial interpolation problem. We address the notion of reverse reduced basis of linearly independent polynomials and design an algorithm for it. Based on the notion, for any monomial ordering we present a new method to read off the <-minimal monomial interpolating basis from monomials appearing in the polynomials representing the interpolation conditions.
Gauss-Seidel (GS) relaxation is often employed as a preconditioner for a Krylov solver or as a smoother for Algebraic Multigrid (AMG). However, the requisite sparse triangular solve is difficult to parallelize on many-core architectures such as graphics processing units (GPUs). In the present study, the performance of the traditional GS relaxation based on a triangular solve is compared with two-stage variants, replacing the direct triangular solve with a fixed number of inner Jacobi-Richardson (JR) iterations. When a small number of inner iterations is sufficient to maintain the Krylov convergence rate, the two-stage GS (GS2) often outperforms the traditional algorithm on many-core architectures. We also compare GS2 with JR. When they perform the same number of flops for SpMV (e.g. three JR sweeps compared to two GS sweeps with one inner JR sweep), the GS2 iterations, and the Krylov solver preconditioned with GS2, may converge faster than the JR iterations. Moreover, for some problems (e.g. elasticity), it was found that JR may diverge with a damping factor of one, whereas two-stage GS may improve the convergence with more inner iterations. Finally, to study the performance of the two-stage smoother and preconditioner for a practical problem, %(e.g. using tuned damping factors), these were applied to incompressible fluid flow simulations on GPUs.
For $m,n in mathbb{N}$, $mgeq 1$ and a given function $f : mathbb{R}^mlongrightarrow mathbb{R}$ the polynomial interpolation problem (PIP) is to determine a emph{generic node set} $P subseteq mathbb{R}^m$ and the coefficients of the uniquely defined polynomial $Qinmathbb{R}[x_1,dots,x_m]$ in $m$ variables of degree $mathrm{deg}(Q)leq n in mathbb{N}$ that fits $f$ on $P$, i.e., $Q(p) = f(p)$, $forall, p in P$. We here show that in general, i.e., for arbitrary $m,n in mathbb{N}$, $m geq 1$, there exists an algorithm that determines $P$ and computes the $N(mbox{m,n})=#P$ coefficients of $Q$ in $mathcal{O}big(N(mbox{m,n})^2big)$ time using $mathcal{O}big(mbox{m}N(mbox{m,n})big)$ storage, without inverting the occurring Vandermonde matrix. We provide such an algorithm, termed PIP-SOLVER, based on a recursive decomposition of the problem and prove its correctness. Since the present approach solves the PIP without matrix inversion, it is computationally more efficient and numerically more robust than previous approaches. We demonstrate this in numerical experiments and compare with previous approaches based on matrix inversion and linear systems solving.
The linear equations that arise in interior methods for constrained optimization are sparse symmetric indefinite and become extremely ill-conditioned as the interior method converges. These linear systems present a challenge for existing solver frameworks based on sparse LU or LDL^T decompositions. We benchmark five well known direct linear solver packages using matrices extracted from power grid optimization problems. The achieved solution accuracy varies greatly among the packages. None of the tested packages delivers significant GPU acceleration for our test cases.
We investigate multivariate integration for a space of infinitely times differentiable functions $mathcal{F}_{s, boldsymbol{u}} := {f in C^infty [0,1]^s mid | f |_{mathcal{F}_{s, boldsymbol{u}}} < infty }$, where $| f |_{mathcal{F}_{s, boldsymbol{u}}} := sup_{boldsymbol{alpha} = (alpha_1, dots, alpha_s) in mathbb{N}_0^s} |f^{(boldsymbol{alpha})}|_{L^1}/prod_{j=1}^s u_j^{alpha_j}$, $f^{(boldsymbol{alpha})} := frac{partial^{|boldsymbol{alpha}|}}{partial x_1^{alpha_1} cdots partial x_s^{alpha_s}}f$ and $boldsymbol{u} = {u_j}_{j geq 1}$ is a sequence of positive decreasing weights. Let $e(n,s)$ be the minimal worst-case error of all algorithms that use $n$ function values in the $s$-variate case. We prove that for any $boldsymbol{u}$ and $s$ considered $e(n,s) leq C(s) exp(-c(s)(log{n})^2)$ holds for all $n$, where $C(s)$ and $c(s)$ are constants which may depend on $s$. Further we show that if the weights $boldsymbol{u}$ decay sufficiently fast then there exist some $1 < p < 2$ and absolute constants $C$ and $c$ such that $e(n,s) leq C exp(-c(log{n})^p)$ holds for all $s$ and $n$. These bounds are attained by quasi-Monte Carlo integration using digital nets. These convergence and tractability results come from those for the Walsh space into which $mathcal{F}_{s, boldsymbol{u}}$ is embedded.