Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Faster Remainder by Direct Computation: Applications to Compilers and Software Libraries

120 0 0.0 ( 0 )

Download Cite

Added by Daniel Lemire

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Daniel Lemire - Owen Kaser - Nathan Kurz

Mathematical Software Performance

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

On common processors, integer multiplication is many times faster than integer division. Dividing a numerator n by a divisor d is mathematically equivalent to multiplication by the inverse of the divisor (n / d = n x 1/d). If the divisor is known in advance---or if repeated integer divisions will be performed with the same divisor---it can be beneficial to substitute a less costly multiplication for an expensive division. Currently, the remainder of the division by a constant is computed from the quotient by a multiplication and a subtraction. But if just the remainder is desired and the quotient is unneeded, this may be suboptimal. We present a generally applicable algorithm to compute the remainder more directly. Specifically, we use the fractional portion of the product of the numerator and the inverse of the divisor. On this basis, we also present a new, simpler divisibility algorithm to detect nonzero remainders. We also derive new tight bounds on the precision required when representing the inverse of the divisor. Furthermore, we present simple C implementations that beat the optimized code produced by state-of-art C compilers on recent x64 processors (e.g., Intel Skylake and AMD Ryzen), sometimes by more than 25%. On all tested platforms including 64-bit ARM and POWER8, our divisibility-test functions are faster than state-of-the-art Granlund-Montgomery divisibility-test functions, sometimes by more than 50%.

rate research

Toward Efficient Interactions between Python and Native Libraries

232 - Jialiang Tan , Yu Chen , Zhenming Liu 2021

Python has become a popular programming language because of its excellent programmability. Many modern software packages utilize Python for high-level algorithm design and depend on native libraries written in C/C++/Fortran for efficient computation kernels. Interaction between Python code and native libraries introduces performance losses because of the abstraction lying on the boundary of Python and native libraries. On the one side, Python code, typically run with interpretation, is disjoint from its execution behavior. On the other side, native libraries do not include program semantics to understand algorithm defects. To understand the interaction inefficiencies, we extensively study a large collection of Python software packages and categorize them according to the root causes of inefficiencies. We extract two inefficiency patterns that are common in interaction inefficiencies. Based on these patterns, we develop PieProf, a lightweight profiler, to pinpoint interaction inefficiencies in Python applications. The principle of PieProf is to measure the inefficiencies in the native execution and associate inefficiencies with high-level Python code to provide a holistic view. Guided by PieProf, we optimize 17 real-world applications, yielding speedups up to 6.3$times$ on application level.

Programming Languages Performance

ZKCM: a C++ library for multiprecision matrix computation with applications in quantum information

553 - Akira SaiToh 2013

ZKCM is a C++ library developed for the purpose of multiprecision matrix computation, on the basis of the GNU MP and MPFR libraries. It provides an easy-to-use syntax and convenient functions for matrix manipulations including those often used in numerical simulations in quantum physics. Its extension library, ZKCM_QC, is developed for simulating quantum computing using the time-dependent matrix-product-state simulation method. This paper gives an introduction about the libraries with practical sample programs.

Mathematical Software Computational Physics Quantum Physics

LAGraph: Linear Algebra, Network Analysis Libraries, and the Study of Graph Algorithms

68 - Gabor Szarnyas , David A. Bader , Timothy A. Davis 2021

Graph algorithms can be expressed in terms of linear algebra. GraphBLAS is a library of low-level building blocks for such algorithms that targets algorithm developers. LAGraph builds on top of the GraphBLAS to target users of graph algorithms with high-level algorithms common in network analysis. In this paper, we describe the first release of the LAGraph library, the design decisions behind the library, and performance using the GAP benchmark suite. LAGraph, however, is much more than a library. It is also a project to document and analyze the full range of algorithms enabled by the GraphBLAS. To that end, we have developed a compact and intuitive notation for describing these algorithms. In this paper, we present that notation with examples from the GAP benchmark suite.

Mathematical Software Data Structures and Algorithms

Faster arbitrary-precision dot product and matrix multiplication

74 - Fredrik Johansson 2019

We present algorithms for real and complex dot product and matrix multiplication in arbitrary-precision floating-point and ball arithmetic. A low-overhead dot product is implemented on the level of GMP limb arrays; it is about twice as fast as previous code in MPFR and Arb at precision up to several hundred bits. Up to 128 bits, it is 3-4 times as fast, costing 20-30 cycles per term for floating-point evaluation and 40-50 cycles per term for balls. We handle large matrix multiplications even more efficiently via blocks of scaled integer matrices. The new methods are implemented in Arb and significantly speed up polynomial operations and linear algebra.

Mathematical Software Numerical Analysis

A Faster, More Intuitive RooFit

89 - Stephan Hageboeck 2020

RooFit and RooStats, the toolkits for statistical modelling in ROOT, are used in most searches and measurements at the Large Hadron Collider as well as at $B$ factories. Larger datasets to be collected at e.g. the High-Luminosity LHC will enable measurements with higher precision, but will require faster data processing to keep fitting times stable. In this work, a simplification of RooFits interfaces and a redesign of its internal dataflow is presented. Interfaces are being extended to look and feel more STL-like to be more accessible both from C++ and Python to improve interoperability and ease of use, while maintaining compatibility with old code. The redesign of the dataflow improves cache locality and data loading, and can be used to process batches of data with vectorised SIMD computations. This reduces the time for computing unbinned likelihoods by a factor four to 16. This will allow to fit larger datasets of the future in the same time or faster than todays fits.

Mathematical Software High Energy Physics - Experiment Data Analysis Statistics and Probability

comments

Fetching comments

American University of Beirut

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Faster Remainder by Direct Computation: Applications to Compilers and Software Libraries

Ask ChatGPT about the research

No Arabic abstract

Read More