ترغب بنشر مسار تعليمي؟ اضغط هنا

Time and space efficient generators for quasiseparable matrices

53   0   0.0 ( 0 )
 نشر من قبل Clement Pernet
 تاريخ النشر 2017
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

The class of quasiseparable matrices is defined by the property that any submatrix entirely below or above the main diagonal has small rank, namely below a bound called the order of quasiseparability. These matrices arise naturally in solving PDEs for particle interaction with the Fast Multi-pole Method (FMM), or computing generalized eigenvalues. From these application fields, structured representations and algorithms have been designed in numerical linear algebra to compute with these matrices in time linear in the matrix dimension and either quadratic or cubic in the quasiseparability order. Motivated by the design of the general purpose exact linear algebra library LinBox, and by algorithmic applications in algebraic computing, we adapt existing techniques introduce novel ones to use quasiseparable matrices in exact linear algebra, where sub-cubic matrix arithmetic is available. In particular, we will show, the connection between the notion of quasiseparability and the rank profile matrix invariant, that we have introduced in 2015. It results in two new structured representations, one being a simpler variation on the hierarchically semiseparable storage, and the second one exploiting the generalized Bruhat decomposition. As a consequence, most basic operations, such as computing the quasiseparability orders, applying a vector, a block vector, multiplying two quasiseparable matrices together, inverting a quasiseparable matrix, can be at least as fast and often faster than previous existing algorithms.



قيم البحث

اقرأ أيضاً

58 - Clement Pernet 2016
The class of quasiseparable matrices is defined by a pair of bounds, called the quasiseparable orders, on the ranks of the maximal sub-matrices entirely located in their strictly lower and upper triangular parts. These arise naturally in applications , as e.g. the inverse of band matrices, and are widely used for they admit structured representations allowing to compute with them in time linear in the dimension and quadratic with the quasiseparable order. We show, in this paper, the connection between the notion of quasisepa-rability and the rank profile matrix invariant, presented in [Dumas & al. ISSAC15]. This allows us to propose an algorithm computing the quasiseparable orders (rL, rU) in time O(n^2 s^($omega$--2)) where s = max(rL, rU) and $omega$ the exponent of matrix multiplication. We then present two new structured representations, a binary tree of PLUQ decompositions, and the Bruhat generator, using respectively O(ns log n/s) and O(ns) field elements instead of O(ns^2) for the previously known generators. We present algorithms computing these representations in time O(n^2 s^($omega$--2)). These representations allow a matrix-vector product in time linear in the size of their representation. Lastly we show how to multiply two such structured matrices in time O(n^2 s^($omega$--2)).
109 - Monique Laurent 2008
In this note we prove a generalization of the flat extension theorem of Curto and Fialkow for truncated moment matrices. It applies to moment matrices indexed by an arbitrary set of monomials and its border, assuming that this set is connected to 1. When formulated in a basis-free setting, this gives an equivalent result for truncated Hankel operators.
Let $f_1,...,f_s in mathbb{K}[x_1,...,x_m]$ be a system of polynomials generating a zero-dimensional ideal $I$, where $mathbb{K}$ is an arbitrary algebraically closed field. We study the computation of matrices of traces for the factor algebra $A := CC[x_1, ..., x_m]/ I$, i.e. matrices with entries which are trace functions of the roots of $I$. Such matrices of traces in turn allow us to compute a system of multiplication matrices ${M_{x_i}|i=1,...,m}$ of the radical $sqrt{I}$. We first propose a method using Macaulay type resultant matrices of $f_1,...,f_s$ and a polynomial $J$ to compute moment matrices, and in particular matrices of traces for $A$. Here $J$ is a polynomial generalizing the Jacobian. We prove bounds on the degrees needed for the Macaulay matrix in the case when $I$ has finitely many projective roots in $mathbb{P}^m_CC$. We also extend previous results which work only for the case where $A$ is Gorenstein to the non-Gorenstein case. The second proposed method uses Bezoutian matrices to compute matrices of traces of $A$. Here we need the assumption that $s=m$ and $f_1,...,f_m$ define an affine complete intersection. This second method also works if we have higher dimensional components at infinity. A new explicit description of the generators of $sqrt{I}$ are given in terms of Bezoutians.
A surge in artificial intelligence and autonomous technologies have increased the demand toward enhanced edge-processing capabilities. Computational complexity and size of state-of-the-art Deep Neural Networks (DNNs) are rising exponentially with div erse network models and larger datasets. This growth limits the performance scaling and energy-efficiency of both distributed and embedded inference platforms. Embedded designs at the edge are constrained by energy and speed limitations of available processor substrates and processor to memory communication required to fetch the model coefficients. While many hardware accelerator and network deployment frameworks have been in development, a framework is needed to allow the variety of existing architectures, and those in development, to be expressed in critical parts of the flow that perform various optimization steps. Moreover, premature architecture-blind network selection and optimization diminish the effectiveness of schedule optimizations and hardware-specific mappings. In this paper, we address these issues by creating a cross-layer software-hardware design framework that encompasses network training and model compression that is aware of and tuned to the underlying hardware architecture. This approach leverages the available degrees of DNN structure and sparsity to create a converged network that can be partitioned and efficiently scheduled on the target hardware platform, minimizing data movement, and improving the overall throughput and energy. To further streamline the design, we leverage the high-level, flexible SoC generator platform based on RISC-V ROCC framework. This integration allows seamless extensions of the RISC-V instruction set and Chisel-based rapid generator design. Utilizing this approach, we implemented a silicon prototype in a 16 nm TSMC process node achieving record processing efficiency of up to 18 TOPS/W.
124 - Liyun Dai , Bican Xia 2012
This paper revisits an algorithm for isolating real roots of univariate polynomials based on continued fractions. It follows the work of Vincent, Uspen- sky, Collins and Akritas, Johnson and Krandick. We use some tricks, especially a new algorithm fo r computing an upper bound of positive roots. In this way, the algorithm of isolating real roots is improved. The complexity of our method for computing an upper bound of positive roots is O(n log(u+1)) where u is the optimal upper bound satisfying Theorem 3 and n is the degree of the polynomial. Our method has been implemented as a software package logcf using C++ language. For many benchmarks logcf is two or three times faster than the function RootIntervals of Mathematica. And it is much faster than another continued fractions based software CF, which seems to be one of the fastest available open software for exact real root isolation. For those benchmarks which have only real roots, logcf is much faster than Sleeve and eigensolve which are based on numerical computation.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا