New community

Subscribe to the gold package and get unlimited access to Shamra Academy

LightSeq: A High Performance Inference Library for Transformers

77 0 0.0 ( 0 )

Download Cite

Added by Yang Wei

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Xiaohui Wang - Ying Xiong - Yang Wei

Mathematical Software Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Transformer, BERT and their variants have achieved great success in natural language processing. Since Transformer models are huge in size, serving these models is a challenge for real industrial applications. In this paper, we propose LightSeq, a highly efficient inference library for models in the Transformer family. LightSeq includes a series of GPU optimization techniques to to streamline the computation of neural layers and to reduce memory footprint. LightSeq can easily import models trained using PyTorch and Tensorflow. Experimental results on machine translation benchmarks show that LightSeq achieves up to 14x speedup compared with TensorFlow and 1.4x compared with FasterTransformer, a concurrent CUDA implementation. The code is available at https://github.com/bytedance/lightseq.

rate research

A multiprecision matrix calculation library and its extension library for a matrix-product-state simulation of quantum computing

134 - Akira SaiToh 2011

A C++ library, named ZKCM, has been developed for the purpose of multiprecision matrix calculations, which is based on the GNU MP and MPFR libraries. It is especially convenient for writing programs involving tensor-product operations, tracing-out operations, and singular-value decompositions. Its extension library, ZKCM_QC, for simulating quantum computing has been developed using the time-dependent matrix-product-state simulation method. This report gives a brief introduction to the libraries with sample programs.

Mathematical Software Quantum Physics

TensorFlow RiemOpt: a library for optimization on Riemannian manifolds

61 - Oleg Smirnov 2021

The adoption of neural networks and deep learning in non-Euclidean domains has been hindered until recently by the lack of scalable and efficient learning frameworks. Existing toolboxes in this space were mainly motivated by research and education use cases, whereas practical aspects, such as deploying and maintaining machine learning models, were often overlooked. We attempt to bridge this gap by proposing TensorFlow RiemOpt, a Python library for optimization on Riemannian manifolds in TensorFlow. The library is designed with the aim for a seamless integration with the TensorFlow ecosystem, targeting not only research, but also streamlining production machine learning pipelines.

Mathematical Software Computational Geometry Machine Learning

Face.evoLVe: A High-Performance Face Recognition Library

100 - Qingzhong Wang , Pengfei Zhang , Haoyi Xiong 2021

In this paper, we develop face.evoLVe -- a comprehensive library that collects and implements a wide range of popular deep learning-based methods for face recognition. First of all, face.evoLVe is composed of key components that cover the full process of face analytics, including face alignment, data processing, various backbones, losses, and alternatives with bags of tricks for improving performance. Later, face.evoLVe supports multi-GPU training on top of different deep learning platforms, such as PyTorch and PaddlePaddle, which facilitates researchers to work on both large-scale datasets with millions of images and low-shot counterparts with limited well-annotated data. More importantly, along with face.evoLVe, images before & after alignment in the common benchmark datasets are released with source codes and trained models provided. All these efforts lower the technical burdens in reproducing the existing methods for comparison, while users of our library could focus on developing advanced approaches more efficiently. Last but not least, face.evoLVe is well designed and vibrantly evolving, so that new face recognition approaches can be easily plugged into our framework. Note that we have used face.evoLVe to participate in a number of face recognition competitions and secured the first place. The version that supports PyTorch is publicly available at https://github.com/ZhaoJ9014/face.evoLVe.PyTorch and the PaddlePaddle version is available at https://github.com/ZhaoJ9014/face.evoLVe.PyTorch/tree/master/paddle. Face.evoLVe has been widely used for face analytics, receiving 2.4K stars and 622 forks.

Computer Vision and Pattern Recognition Artificial Intelligence Multimedia

CUDACLAW: A high-performance programmable GPU framework for the solution of hyperbolic PDEs

114 - H. Gorune Ohannessian , George Turkiyyah , Aron Ahmadia 2018

We present cudaclaw, a CUDA-based high performance data-parallel framework for the solution of multidimensional hyperbolic partial differential equation (PDE) systems, equations describing wave motion. cudaclaw allows computational scientists to solve such systems on GPUs without being burdened by the need to write CUDA code, worry about thread and block details, data layout, and data movement between the different levels of the memory hierarchy. The user defines the set of PDEs to be solved via a CUDA- independent serial Riemann solver and the framework takes care of orchestrating the computations and data transfers to maximize arithmetic throughput. cudaclaw treats the different spatial dimensions separately to allow suitable block sizes and dimensions to be used in the different directions, and includes a number of optimizations to minimize access to global memory.

Mathematical Software Numerical Analysis Numerical Analysis

Deriving Correct High-Performance Algorithms

67 - Devangi N. Parikh , Maggie E. Myers , Robert A. van de Geijn 2017

Dijkstra observed that verifying correctness of a program is difficult and conjectured that derivation of a program hand-in-hand with its proof of correctness was the answer. We illustrate this goal-oriented approach by applying it to the domain of dense linear algebra libraries for distributed memory parallel computers. We show that algorithms that underlie the implementation of most functionality for this domain can be systematically derived to be correct. The benefit is that an entire family of algorithms for an operation is discovered so that the best algorithm for a given architecture can be chosen. This approach is very practical: Ideas inspired by it have been used to rewrite the dense linear algebra software stack starting below the Basic Linear Algebra Subprograms (BLAS) and reaching up through the Elemental distributed memory library, and every level in between. The paper demonstrates how formal methods and rigorous mathematical techniques for correctness impact HPC.

Mathematical Software

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

LightSeq: A High Performance Inference Library for Transformers

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions