Custom-Precision Mathematical Library Explorations for Code Profiling and Optimization

57 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Matei Istoan

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف David Defour

البرمجيات الرياضية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The typical processors used for scientific computing have fixed-width data-paths. This implies that mathematical libraries were specifically developed to target each of these fixed precisions (binary16, binary32, binary64). However, to address the increasing energy consumption and throughput requirements of scientific applications, library and hardware designers are moving beyond this one-size-fits-all approach. In this article we propose to study the effects and benefits of using user-defined floating-point formats and target accuracies in calculations involving mathematical functions. Our tool collects input-data profiles and iteratively explores lower precisions for each call-site of a mathematical function in user applications. This profiling data will be a valuable asset for specializing and fine-tuning mathematical function implementations for a given application. We demonstrate the tools capabilities on SGP4, a satellite tracking application. The profile data shows the potential for specialization and provides insight into answering where it is useful to provide variable-precision designs for elementary function evaluation.

قيم البحث

61 - Oleg Smirnov 2021

The adoption of neural networks and deep learning in non-Euclidean domains has been hindered until recently by the lack of scalable and efficient learning frameworks. Existing toolboxes in this space were mainly motivated by research and education us e cases, whereas practical aspects, such as deploying and maintaining machine learning models, were often overlooked. We attempt to bridge this gap by proposing TensorFlow RiemOpt, a Python library for optimization on Riemannian manifolds in TensorFlow. The library is designed with the aim for a seamless integration with the TensorFlow ecosystem, targeting not only research, but also streamlining production machine learning pipelines.

البرمجيات الرياضية الهندسة الحسابية التعلم الآلي

A 55-line code for large-scale parallel topology optimization in 2D and 3D

79 - Abhinav Gupta , Rajib Chowdhury , Anupam Chakrabarti 2020

This paper presents a 55-line code written in python for 2D and 3D topology optimization (TO) based on the open-source finite element computing software (FEniCS), equipped with various finite element tools and solvers. PETSc is used as the linear alg ebra back-end, which results in significantly less computational time than standard python libraries. The code is designed based on the popular solid isotropic material with penalization (SIMP) methodology. Extensions to multiple load cases, different boundary conditions, and incorporation of passive elements are also presented. Thus, this implementation is the most compact implementation of SIMP based topology optimization for 3D as well as 2D problems. Utilizing the concept of Euclidean distance matrix to vectorize the computation of the weight matrix for the filter, we have achieved a substantial reduction in the computational time and have also made it possible for the code to work with complex ground structure configurations. We have also presented the codes extension to large-scale topology optimization problems with support for parallel computations on complex structural configuration, which could help students and researchers explore novel insights into the TO problem with dense meshes. Appendix-A contains the complete code, and the website: url{https://github.com/iitrabhi/topo-fenics} also contains the complete code.

البرمجيات الرياضية الهندسة الحاسوبية، المالية،العلوم التحسين والتحكم

A multiprecision matrix calculation library and its extension library for a matrix-product-state simulation of quantum computing

132 - Akira SaiToh 2011

A C++ library, named ZKCM, has been developed for the purpose of multiprecision matrix calculations, which is based on the GNU MP and MPFR libraries. It is especially convenient for writing programs involving tensor-product operations, tracing-out op erations, and singular-value decompositions. Its extension library, ZKCM_QC, for simulating quantum computing has been developed using the time-dependent matrix-product-state simulation method. This report gives a brief introduction to the libraries with sample programs.

البرمجيات الرياضية فيزياء الكم

Elements of Design for Containers and Solutions in the LinBox Library

208 - Brice Boyer , Jean-Guillaume Dumas 2014

We describe in this paper new design techniques used in the cpp exact linear algebra library linbox, intended to make the library safer and easier to use, while keeping it generic and efficient. First, we review the new simplified structure for conta iners, based on our emph{founding scope allocation} model. We explain design choices and their impact on coding: unification of our matrix classes, clearer model for matrices and submatrices, etc Then we present a variation of the emph{strategy} design pattern that is comprised of a controller--plugin system: the controller (solution) chooses among plug-ins (algorithms) that always call back the controllers for subtasks. We give examples using the solution mul. Finally we present a benchmark architecture that serves two purposes: Providing the user with easier ways to produce graphs; Creating a framework for automatically tuning the library and supporting regression testing.

البرمجيات الرياضية الحساب الرمزي هندسة البرمجيات

LightSeq: A High Performance Inference Library for Transformers

76 - Xiaohui Wang , Ying Xiong , Yang Wei 2020

Transformer, BERT and their variants have achieved great success in natural language processing. Since Transformer models are huge in size, serving these models is a challenge for real industrial applications. In this paper, we propose LightSeq, a hi ghly efficient inference library for models in the Transformer family. LightSeq includes a series of GPU optimization techniques to to streamline the computation of neural layers and to reduce memory footprint. LightSeq can easily import models trained using PyTorch and Tensorflow. Experimental results on machine translation benchmarks show that LightSeq achieves up to 14x speedup compared with TensorFlow and 1.4x compared with FasterTransformer, a concurrent CUDA implementation. The code is available at https://github.com/bytedance/lightseq.

البرمجيات الرياضية التعلم الآلي

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الجامعة العربية الخاصة للعلوم والتكنولوجيا

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Custom-Precision Mathematical Library Explorations for Code Profiling and Optimization

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً