Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Compressing Large-Scale Transformer-Based Models: A Case Study on BERT

ضغط النماذج المستندة إلى المحولات على نطاق واسع: دراسة حالة على بيرت

278 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Abstract Pre-trained Transformer-based models have achieved state-of-the-art performance for various Natural Language Processing (NLP) tasks. However, these models often have billions of parameters, and thus are too resource- hungry and computation-intensive to suit low- capability devices or applications with strict latency requirements. One potential remedy for this is model compression, which has attracted considerable research attention. Here, we summarize the research in compressing Transformers, focusing on the especially popular BERT model. In particular, we survey the state of the art in compression for BERT, we clarify the current best practices for compressing large-scale Transformer models, and we provide insights into the workings of various methods. Our categorization and analysis also shed light on promising future research directions for achieving lightweight, accurate, and generic NLP models.

References used

https://aclanthology.org/

rate research

An Architecture for Accelerated Large-Scale Inference of Transformer-Based Language Models

406 - Association for Computation Linguistics 2021 مقالة

This work demonstrates the development process of a machine learning architecture for inference that can scale to a large volume of requests. We used a BERT model that was fine-tuned for emotion analysis, returning a probability distribution of emoti ons given a paragraph. The model was deployed as a gRPC service on Kubernetes. Apache Spark was used to perform inference in batches by calling the service. We encountered some performance and concurrency challenges and created solutions to achieve faster running time. Starting with 200 successful inference requests per minute, we were able to achieve as high as 18 thousand successful requests per minute with the same batch job resource allocation. As a result, we successfully stored emotion probabilities for 95 million paragraphs within 96 hours.

كلمة أساسية accelerated large-scale inference architecture for accelerated تسارع الاستدلال على نطاق واسع العمارة للتسرع صناعة حمض الفوسفور

BERT meets Shapley: Extending SHAP Explanations to Transformer-based Classifiers

397 - Association for Computation Linguistics 2021 مقالة

Transformer-based neural networks offer very good classification performance across a wide range of domains, but do not provide explanations of their predictions. While several explanation methods, including SHAP, address the problem of interpreting deep learning models, they are not adapted to operate on state-of-the-art transformer-based neural networks such as BERT. Another shortcoming of these methods is that their visualization of explanations in the form of lists of most relevant words does not take into account the sequential and structurally dependent nature of text. This paper proposes the TransSHAP method that adapts SHAP to transformer models including BERT-based text classifiers. It advances SHAP visualizations by showing explanations in a sequential manner, assessed by human evaluators as competitive to state-of-the-art solutions.

bert meets shapley extending shap explanations meets shapley بيرت يلتقي shemley تفسيرات التشكيل يلتقي shemley صناعة حمض الفوسفور المزيد..

Improving Embedding-based Large-scale Retrieval via Label Enhancement

541 - Association for Computation Linguistics 2021 مقالة

Current embedding-based large-scale retrieval models are trained with 0-1 hard label that indicates whether a query is relevant to a document, ignoring rich information of the relevance degree. This paper proposes to improve embedding-based retrieval from the perspective of better characterizing the query-document relevance degree by introducing label enhancement (LE) for the first time. To generate label distribution in the retrieval scenario, we design a novel and effective supervised LE method that incorporates prior knowledge from dynamic term weighting methods into contextual embeddings. Our method significantly outperforms four competitive existing retrieval models and its counterparts equipped with two alternative LE techniques by training models with the generated label distribution as auxiliary supervision information. The superiority can be easily observed on English and Chinese large-scale retrieval tasks under both standard and cold-start settings.

improving embedding-based large-scale current embedding-based large-scale تحسين تضمين واسع النطاق نطاق واسع القائم على نطاق واسع صناعة حمض الفوسفور

Large-Scale Contextualised Language Modelling for Norwegian

437 - Association for Computation Linguistics 2021 مقالة

We present the ongoing NorLM initiative to support the creation and use of very large contextualised language models for Norwegian (and in principle other Nordic languages), including a ready-to-use software environment, as well as an experience repo rt for data preparation and training. This paper introduces the first large-scale monolingual language models for Norwegian, based on both the ELMo and BERT frameworks. In addition to detailing the training process, we present contrastive benchmark results on a suite of NLP tasks for Norwegian. For additional background and access to the data, models, and software, please see: http://norlm.nlpl.eu

contextualised language modelling modelling for norwegian contextualised language models النمذجة اللغة السياقية النمذجة للنرويجية صناعة حمض الفوسفور

QuadrupletBERT: An Efficient Model For Embedding-Based Large-Scale Retrieval

498 - Association for Computation Linguistics 2021 مقالة

The embedding-based large-scale query-document retrieval problem is a hot topic in the information retrieval (IR) field. Considering that pre-trained language models like BERT have achieved great success in a wide variety of NLP tasks, we present a Q uadrupletBERT model for effective and efficient retrieval in this paper. Unlike most existing BERT-style retrieval models, which only focus on the ranking phase in retrieval systems, our model makes considerable improvements to the retrieval phase and leverages the distances between simple negative and hard negative instances to obtaining better embeddings. Experimental results demonstrate that our QuadrupletBERT achieves state-of-the-art results in embedding-based large-scale retrieval tasks.

embedding-based large-scale retrieval embedding-based large-scale embedding-based large-scale query-document تضمين استرجاع واسع النطاق تضمين واسع النطاق استشانة واسعة النطاق على نطاق واسع صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Compressing Large-Scale Transformer-Based Models: A Case Study on BERT

ضغط النماذج المستندة إلى المحولات على نطاق واسع: دراسة حالة على بيرت

Ask ChatGPT about the research

Read More

suggested questions