EasyTransfer -- A Simple and Scalable Deep Transfer Learning Platform for NLP Applications

56 0 0.0 ( 0 )

Download Cite

Added by Minghui Qiu

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Minghui Qiu - Peng Li - Chengyu Wang

Computation and Language

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The literature has witnessed the success of leveraging Pre-trained Language Models (PLMs) and Transfer Learning (TL) algorithms to a wide range of Natural Language Processing (NLP) applications, yet it is not easy to build an easy-to-use and scalable TL toolkit for this purpose. To bridge this gap, the EasyTransfer platform is designed to develop deep TL algorithms for NLP applications. EasyTransfer is backended with a high-performance and scalable engine for efficient training and inference, and also integrates comprehensive deep TL algorithms, to make the development of industrial-scale TL applications easier. In EasyTransfer, the built-in data and model parallelism strategies, combined with AI compiler optimization, show to be 4.0x faster than the community version of distributed training. EasyTransfer supports various NLP models in the ModelZoo, including mainstream PLMs and multi-modality models. It also features various in-house developed TL algorithms, together with the AppZoo for NLP applications. The toolkit is convenient for users to quickly start model training, evaluation, and online deployment. EasyTransfer is currently deployed at Alibaba to support a variety of business scenarios, including item recommendation, personalized search, conversational question answering, etc. Extensive experiments on real-world datasets and online applications show that EasyTransfer is suitable for online production with cutting-edge performance for various applications. The source code of EasyTransfer is released at Github (https://github.com/alibaba/EasyTransfer).

rate research

System Demo for Transfer Learning across Vision and Text using Domain Specific CNN Accelerator for On-Device NLP Applications

176 - Baohua Sun , Lin Yang , Michael Lin 2019

Power-efficient CNN Domain Specific Accelerator (CNN-DSA) chips are currently available for wide use in mobile devices. These chips are mainly used in computer vision applications. However, the recent work of Super Characters method for text classification and sentiment analysis tasks using two-dimensional CNN models has also achieved state-of-the-art results through the method of transfer learning from vision to text. In this paper, we implemented the text classification and sentiment analysis applications on mobile devices using CNN-DSA chips. Compact network representations using one-bit and three-bits precision for coefficients and five-bits for activations are used in the CNN-DSA chip with power consumption less than 300mW. For edge devices under memory and compute constraints, the network is further compressed by approximating the external Fully Connected (FC) layers within the CNN-DSA chip. At the workshop, we have two system demonstrations for NLP tasks. The first demo classifies the input English Wikipedia sentence into one of the 14 ontologies. The second demo classifies the Chinese online-shopping review into positive or negative.

Computation and Language

Deep Learning for Text Style Transfer: A Survey

134 - Di Jin , Zhijing Jin , Zhiting Hu 2020

Text style transfer (TST) is an important task in natural language generation (NLG), which aims to control certain attributes in the generated text, such as politeness, emotion, humor, and many others. It has a long history in the field of natural language processing (NLP), and recently has re-gained significant attention thanks to the promising performance brought by deep neural models. In this paper, we present a systematic survey of the research on neural text style transfer, spanning over 100 representative articles since the first neural text style transfer work in 2017. We discuss the task formulation, existing datasets and subtasks, evaluation, as well as the rich methodologies in the presence of parallel and non-parallel data. We also provide discussions on a variety of important topics regarding the future development of TST. Our curated paper list is at https://github.com/zhijing-jin/Text_Style_Transfer_Survey

Computation and Language Artificial Intelligence Machine Learning

Easy and Efficient Transformer : Scalable Inference Solution For large NLP model

70 - Gongzheng Li , Yadong Xi , Jingzhen Ding 2021

Recently, large-scale transformer-based models have been proven to be effective over a variety of tasks across many domains. Nevertheless, putting them into production is very expensive, requiring comprehensive optimization techniques to reduce inference costs. This paper introduces a series of transformer inference optimization techniques that are both in algorithm level and hardware level. These techniques include a pre-padding decoding mechanism that improves token parallelism for text generation, and highly optimized kernels designed for very long input length and large hidden size. On this basis, we propose a transformer inference acceleration library -- Easy and Efficient Transformer (EET), which has a significant performance improvement over existing libraries. Compared to Faster Transformer v4.0s implementation for GPT-2 layer on A100, EET achieves a 1.5-4.5x state-of-art speedup varying with different context lengths. EET is available at https://github.com/NetEase-FuXi/EET. A demo video is available at https://youtu.be/22UPcNGcErg.

Computation and Language

An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models

157 - Alexandra Chronopoulou , Christos Baziotis , Alexandros Potamianos 2019

A growing number of state-of-the-art transfer learning methods employ language models pretrained on large generic corpora. In this paper we present a conceptually simple and effective transfer learning approach that addresses the problem of catastrophic forgetting. Specifically, we combine the task-specific optimization function with an auxiliary language model objective, which is adjusted during the training process. This preserves language regularities captured by language models, while enabling sufficient adaptation for solving the target task. Our method does not require pretraining or finetuning separate components of the network and we train our models end-to-end in a single step. We present results on a variety of challenging affective and text classification tasks, surpassing well established transfer learning methods with greater level of complexity.

Computation and Language Machine Learning

Simple, Scalable, and Stable Variational Deep Clustering

370 - Lele Cao , Sahar Asadi , Wenfei Zhu 2020

Deep clustering (DC) has become the state-of-the-art for unsupervised clustering. In principle, DC represents a variety of unsupervised methods that jointly learn the underlying clusters and the latent representation directly from unstructured datasets. However, DC methods are generally poorly applied due to high operational costs, low scalability, and unstable results. In this paper, we first evaluate several popular DC variants in the context of industrial applicability using eight empirical criteria. We then choose to focus on variational deep clustering (VDC) methods, since they mostly meet those criteria except for simplicity, scalability, and stability. To address these three unmet criteria, we introduce four generic algorithmic improvements: initial $gamma$-training, periodic $beta$-annealing, mini-batch GMM (Gaussian mixture model) initialization, and inverse min-max transform. We also propose a novel clustering algorithm S3VDC (simple, scalable, and stable VDC) that incorporates all those improvements. Our experiments show that S3VDC outperforms the state-of-the-art on both benchmark tasks and a large unstructured industrial dataset without any ground truth label. In addition, we analytically evaluate the usability and interpretability of S3VDC.

Machine Learning Machine Learning