ترغب بنشر مسار تعليمي؟ اضغط هنا

Custom-Precision Mathematical Library Explorations for Code Profiling and Optimization

57   0   0.0 ( 0 )
 نشر من قبل Matei Istoan
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English
 تأليف David Defour




اسأل ChatGPT حول البحث

The typical processors used for scientific computing have fixed-width data-paths. This implies that mathematical libraries were specifically developed to target each of these fixed precisions (binary16, binary32, binary64). However, to address the increasing energy consumption and throughput requirements of scientific applications, library and hardware designers are moving beyond this one-size-fits-all approach. In this article we propose to study the effects and benefits of using user-defined floating-point formats and target accuracies in calculations involving mathematical functions. Our tool collects input-data profiles and iteratively explores lower precisions for each call-site of a mathematical function in user applications. This profiling data will be a valuable asset for specializing and fine-tuning mathematical function implementations for a given application. We demonstrate the tools capabilities on SGP4, a satellite tracking application. The profile data shows the potential for specialization and provides insight into answering where it is useful to provide variable-precision designs for elementary function evaluation.



قيم البحث

اقرأ أيضاً

61 - Oleg Smirnov 2021
The adoption of neural networks and deep learning in non-Euclidean domains has been hindered until recently by the lack of scalable and efficient learning frameworks. Existing toolboxes in this space were mainly motivated by research and education us e cases, whereas practical aspects, such as deploying and maintaining machine learning models, were often overlooked. We attempt to bridge this gap by proposing TensorFlow RiemOpt, a Python library for optimization on Riemannian manifolds in TensorFlow. The library is designed with the aim for a seamless integration with the TensorFlow ecosystem, targeting not only research, but also streamlining production machine learning pipelines.
This paper presents a 55-line code written in python for 2D and 3D topology optimization (TO) based on the open-source finite element computing software (FEniCS), equipped with various finite element tools and solvers. PETSc is used as the linear alg ebra back-end, which results in significantly less computational time than standard python libraries. The code is designed based on the popular solid isotropic material with penalization (SIMP) methodology. Extensions to multiple load cases, different boundary conditions, and incorporation of passive elements are also presented. Thus, this implementation is the most compact implementation of SIMP based topology optimization for 3D as well as 2D problems. Utilizing the concept of Euclidean distance matrix to vectorize the computation of the weight matrix for the filter, we have achieved a substantial reduction in the computational time and have also made it possible for the code to work with complex ground structure configurations. We have also presented the codes extension to large-scale topology optimization problems with support for parallel computations on complex structural configuration, which could help students and researchers explore novel insights into the TO problem with dense meshes. Appendix-A contains the complete code, and the website: url{https://github.com/iitrabhi/topo-fenics} also contains the complete code.
130 - Akira SaiToh 2011
A C++ library, named ZKCM, has been developed for the purpose of multiprecision matrix calculations, which is based on the GNU MP and MPFR libraries. It is especially convenient for writing programs involving tensor-product operations, tracing-out op erations, and singular-value decompositions. Its extension library, ZKCM_QC, for simulating quantum computing has been developed using the time-dependent matrix-product-state simulation method. This report gives a brief introduction to the libraries with sample programs.
We describe in this paper new design techniques used in the cpp exact linear algebra library linbox, intended to make the library safer and easier to use, while keeping it generic and efficient. First, we review the new simplified structure for conta iners, based on our emph{founding scope allocation} model. We explain design choices and their impact on coding: unification of our matrix classes, clearer model for matrices and submatrices, etc Then we present a variation of the emph{strategy} design pattern that is comprised of a controller--plugin system: the controller (solution) chooses among plug-ins (algorithms) that always call back the controllers for subtasks. We give examples using the solution mul. Finally we present a benchmark architecture that serves two purposes: Providing the user with easier ways to produce graphs; Creating a framework for automatically tuning the library and supporting regression testing.
Transformer, BERT and their variants have achieved great success in natural language processing. Since Transformer models are huge in size, serving these models is a challenge for real industrial applications. In this paper, we propose LightSeq, a hi ghly efficient inference library for models in the Transformer family. LightSeq includes a series of GPU optimization techniques to to streamline the computation of neural layers and to reduce memory footprint. LightSeq can easily import models trained using PyTorch and Tensorflow. Experimental results on machine translation benchmarks show that LightSeq achieves up to 14x speedup compared with TensorFlow and 1.4x compared with FasterTransformer, a concurrent CUDA implementation. The code is available at https://github.com/bytedance/lightseq.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا