Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

UniDrop: A Simple yet Effective Technique to Improve Transformer without Extra Cost

Unidrop: تقنية بسيطة ولكنها فعالة لتحسين المحولات دون تكلفة إضافية

723 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

extra cost improve transformer effective technique تكلفة إضافية تحسين المحولات تقنية فعالة صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Transformer architecture achieves great success in abundant natural language processing tasks. The over-parameterization of the Transformer model has motivated plenty of works to alleviate its overfitting for superior performances. With some explorations, we find simple techniques such as dropout, can greatly boost model performance with a careful design. Therefore, in this paper, we integrate different dropout techniques into the training of Transformer models. Specifically, we propose an approach named UniDrop to unites three different dropout techniques from fine-grain to coarse-grain, i.e., feature dropout, structure dropout, and data dropout. Theoretically, we demonstrate that these three dropouts play different roles from regularization perspectives. Empirically, we conduct experiments on both neural machine translation and text classification benchmark datasets. Extensive results indicate that Transformer with UniDrop can achieve around 1.5 BLEU improvement on IWSLT14 translation tasks, and better accuracy for the classification even using strong pre-trained RoBERTa as backbone.

References used

https://aclanthology.org/

rate research

A Simple yet Effective Method for Sentence Ordering

711 - Association for Computation Linguistics 2021 مقالة

Sentence ordering is the task of arranging a given bag of sentences so as to maximise the coherence of the overall text. In this work, we propose a simple yet effective training method that improves the capacity of models to capture overall text cohe rence based on training over pairs of sentences/segments. Experimental results show the superiority of our proposed method in in- and cross-domain settings. The utility of our method is also verified over a multi-document summarisation task.

sentence ordering simple yet effective effective training method ترتيب الجملة بسيطة ولكنها فعالة طريقة التدريب الفعالة صناعة حمض الفوسفور المزيد..

Learning Numeracy: A Simple Yet Effective Number Embedding Approach Using Knowledge Graph

852 - Association for Computation Linguistics 2021 مقالة

Numeracy plays a key role in natural language understanding. However, existing NLP approaches, not only traditional word2vec approach or contextualized transformer-based language models, fail to learn numeracy. As the result, the performance of these models is limited when they are applied to number-intensive applications in clinical and financial domains. In this work, we propose a simple number embedding approach based on knowledge graph. We construct a knowledge graph consisting of number entities and magnitude relations. Knowledge graph embedding method is then applied to obtain number vectors. Our approach is easy to implement, and experiment results on various numeracy-related NLP tasks demonstrate the effectiveness and efficiency of our method.

effective number embedding effective number رقم فعال تضمينه رقم فعال صناعة حمض الفوسفور

Cost-effective End-to-end Information Extraction for Semi-structured Document Images

660 - Association for Computation Linguistics 2021 مقالة

A real-world information extraction (IE) system for semi-structured document images often involves a long pipeline of multiple modules, whose complexity dramatically increases its development and maintenance cost. One can instead consider an end-to-e nd model that directly maps the input to the target output and simplify the entire process. However, such generation approach is known to lead to unstable performance if not designed carefully. Here we present our recent effort on transitioning from our existing pipeline-based IE system to an end-to-end system focusing on practical challenges that are associated with replacing and deploying the system in real, large-scale production. By carefully formulating document IE as a sequence generation task, we show that a single end-to-end IE system can be built and still achieve competent performance.

semi-structured document images semi-structured document صور وثيقة شبه منظمة وثيقة شبه منظمة صناعة حمض الفوسفور

Bag of Tricks for Optimizing Transformer Efficiency

693 - Association for Computation Linguistics 2021 مقالة

Improving Transformer efficiency has become increasingly attractive recently. A wide range of methods has been proposed, e.g., pruning, quantization, new architectures and etc. But these methods are either sophisticated in implementation or dependent on hardware. In this paper, we show that the efficiency of Transformer can be improved by combining some simple and hardware-agnostic methods, including tuning hyper-parameters, better design choices and training strategies. On the WMT news translation tasks, we improve the inference efficiency of a strong Transformer system by 3.80x on CPU and 2.52x on GPU.

tricks for optimizing bag of tricks optimizing transformer efficiency الحيل لتحسين حقيبة الحيل تحسين كفاءة المحولات صناعة حمض الفوسفور المزيد..

Cost-effective Deployment of BERT Models in Serverless Environment

551 - Association for Computation Linguistics 2021 مقالة

In this study, we demonstrate the viability of deploying BERT-style models to AWS Lambda in a production environment. Since the freely available pre-trained models are too large to be deployed in this environment, we utilize knowledge distillation an d fine-tune the models on proprietary datasets for two real-world tasks: sentiment analysis and semantic textual similarity. As a result, we obtain models that are tuned for a specific domain and deployable in the serverless environment. The subsequent performance analysis shows that this solution does not only report latency levels acceptable for production use but that it is also a cost-effective alternative to small-to-medium size deployments of BERT models, all without any infrastructure overhead.

serverless environment aws lambda البيئة بلا خادم AWS Lambda. صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

UniDrop: A Simple yet Effective Technique to Improve Transformer without Extra Cost

Unidrop: تقنية بسيطة ولكنها فعالة لتحسين المحولات دون تكلفة إضافية

Ask ChatGPT about the research

Read More

suggested questions