Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

LightSeq: A High Performance Inference Library for Transformers

lightseq: مكتبة الاستدلال عالية الأداء للمحولات

962 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

high performance inference high performance performance inference library الاستدلال عالية الأداء أداء عالي مكتبة استنتاج الأداء صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Transformer and its variants have achieved great success in natural language processing. Since Transformer models are huge in size, serving these models is a challenge for real industrial applications. In this paper, we propose , a highly efficient inference library for models in the Transformer family. includes a series of GPU optimization techniques to both streamline the computation of Transformer layers and reduce memory footprint. supports models trained using PyTorch and Tensorflow. Experimental results on standard machine translation benchmarks show that achieves up to 14x speedup compared with TensorFlow and 1.4x speedup compared with , a concurrent CUDA implementation. The code will be released publicly after the review.

References used

https://aclanthology.org/

rate research

TenTrans High-Performance Inference Toolkit for WMT2021 Efficiency Task

406 - Association for Computation Linguistics 2021 مقالة

The paper describes the TenTrans's submissions to the WMT 2021 Efficiency Shared Task. We explore training a variety of smaller compact transformer models using the teacher-student setup. Our model is trained by our self-developed open-source multili ngual training platform TenTrans-Py. We also release an open-source high-performance inference toolkit for transformer models and the code is written in C++ completely. All additional optimizations are built on top of the inference engine including attention caching, kernel fusion, early-stop, and several other optimizations. In our submissions, the fastest system can translate more than 22,000 tokens per second with a single Tesla P4 while maintaining 38.36 BLEU on En-De newstest2019. Our trained models and more details are available in TenTrans-Decoding competition examples.

تقاسم الكفاءة efficiency task مهمة الكفاءة صناعة حمض الفوسفور

HPC: High Performance Computing

1357 - Damascus University 2018 حلقة بحث

This Paper Attempts to study the latest advancements in High Performance Computing Technologies, Which Provides suitable environments, Solid infrastructure, Software and Hardware Components, allowing Scientists and Researchers to solve Math, Biology, Machine Learning, Physics Simulations, and numerous other problems, Allowing significant breakthroughs in these fields.

HPC High Performance Computing CUDA MPI IBM Roadrunner

A New Hybrid Digital Signature Algorithm with high Security and Performance

1096 - Aِl-Baath University 2016 ورقة بحثية

The majority of recent digital signature algorithms depend, in their structure, on complicated mathematical concepts that require a long time and a significant computational effort to be executed. As a trial to reduce these problems, some researchers have proposed digital signature algorithms which depend on simple arithmetic functions and operations that are executed quickly, but that was at the expense of the security of algorithms.

التوقيع الرقمي Digital Signature التعمية بالمفتاح العام خوارزمية التوقيع الرقمي خوارزمية تبادل المفاتيح public key encryption RSA digital signature algorithm Diffie-Hellman key exchange algorithm المزيد..

Consistent Accelerated Inference via Confident Adaptive Transformers

550 - Association for Computation Linguistics 2021 مقالة

We develop a novel approach for confidently accelerating inference in the large and expensive multilayer Transformers that are now ubiquitous in natural language processing (NLP). Amortized or approximate computational methods increase efficiency, bu t can come with unpredictable performance costs. In this work, we present CATs -- Confident Adaptive Transformers -- in which we simultaneously increase computational efficiency, while guaranteeing a specifiable degree of consistency with the original model with high confidence. Our method trains additional prediction heads on top of intermediate layers, and dynamically decides when to stop allocating computational effort to each input using a meta consistency classifier. To calibrate our early prediction stopping rule, we formulate a unique extension of conformal prediction. We demonstrate the effectiveness of this approach on four classification and regression tasks.

consistent accelerated inference confident adaptive transformers consistent accelerated الاستدلال المتسار المتسق محولات واثقة من التكيف متساو تسريع صناعة حمض الفوسفور المزيد..

A Simple and Effective Positional Encoding for Transformers

459 - Association for Computation Linguistics 2021 مقالة

Transformer models are permutation equivariant. To supply the order and type information of the input tokens, position and segment embeddings are usually added to the input. Recent works proposed variations of positional encodings with relative posit ion encodings achieving better performance. Our analysis shows that the gain actually comes from moving positional information to attention layer from the input. Motivated by this, we introduce Decoupled Positional Attention for Transformers (DIET), a simple yet effective mechanism to encode position and segment information into the Transformer models. The proposed method has faster training and inference time, while achieving competitive performance on GLUE, XTREME and WMT benchmarks. We further generalize our method to long-range transformers and show performance gain.

effective positional encoding transformer models الترميز الموضعي الفعال طرازات المحولات صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

LightSeq: A High Performance Inference Library for Transformers

lightseq: مكتبة الاستدلال عالية الأداء للمحولات

Ask ChatGPT about the research

Read More

suggested questions