ﻻ يوجد ملخص باللغة العربية
Transformer, BERT and their variants have achieved great success in natural language processing. Since Transformer models are huge in size, serving these models is a challenge for real industrial applications. In this paper, we propose LightSeq, a highly efficient inference library for models in the Transformer family. LightSeq includes a series of GPU optimization techniques to to streamline the computation of neural layers and to reduce memory footprint. LightSeq can easily import models trained using PyTorch and Tensorflow. Experimental results on machine translation benchmarks show that LightSeq achieves up to 14x speedup compared with TensorFlow and 1.4x compared with FasterTransformer, a concurrent CUDA implementation. The code is available at https://github.com/bytedance/lightseq.
A C++ library, named ZKCM, has been developed for the purpose of multiprecision matrix calculations, which is based on the GNU MP and MPFR libraries. It is especially convenient for writing programs involving tensor-product operations, tracing-out op
The adoption of neural networks and deep learning in non-Euclidean domains has been hindered until recently by the lack of scalable and efficient learning frameworks. Existing toolboxes in this space were mainly motivated by research and education us
In this paper, we develop face.evoLVe -- a comprehensive library that collects and implements a wide range of popular deep learning-based methods for face recognition. First of all, face.evoLVe is composed of key components that cover the full proces
We present cudaclaw, a CUDA-based high performance data-parallel framework for the solution of multidimensional hyperbolic partial differential equation (PDE) systems, equations describing wave motion. cudaclaw allows computational scientists to solv
Dijkstra observed that verifying correctness of a program is difficult and conjectured that derivation of a program hand-in-hand with its proof of correctness was the answer. We illustrate this goal-oriented approach by applying it to the domain of d