أصبحت التمثيل التعلم للنص عبر الاحتمالات نموذج لغة على كوربوس كبيرة أصبح نقطة انطلاق قياسية لبناء أنظمة NLP. يقف هذا النهج على النقيض من السيارات الآلية، كما تم تدريبه على النص الخام، ولكن بهدف التعلم لترميز كل إدخال كجاغر يتيح إعادة الإعمار الكامل. AutoNCoders جذابة بسبب هيكل الفضاء الكامن وخصائصها التوليدية. لذلك نستكشف بناء AutoNCoder على مستوى الجملة من نموذج لغة محول محول مسبقا. نحن نقوم بتكييف هدف نمذجة اللغة الملثمين كإنتاجية، وتمديد واحد، في حين أن تدرب فقط عنق الزجاجات الجملة ومكتشف محول بطبقة واحدة. نوضح أن تمثيلات الجملة التي اكتشفها طرازنا تحقق جودة أفضل من الأساليب السابقة التي استخراج تمثيلات من المحولات المسبدة مسبقا على مهام تشابه النص، ونقل النمط (مثال على الجيل الخاضع للرقابة)، ومهام تصنيف الجملة واحدة في معيار الغراء، أثناء استخدام عدد أقل من النماذج المحددة مسبقا.
Representation learning for text via pretraining a language model on a large corpus has become a standard starting point for building NLP systems. This approach stands in contrast to autoencoders, also trained on raw text, but with the objective of learning to encode each input as a vector that allows full reconstruction. Autoencoders are attractive because of their latent space structure and generative properties. We therefore explore the construction of a sentence-level autoencoder from a pretrained, frozen transformer language model. We adapt the masked language modeling objective as a generative, denoising one, while only training a sentence bottleneck and a single-layer modified transformer decoder. We demonstrate that the sentence representations discovered by our model achieve better quality than previous methods that extract representations from pretrained transformers on text similarity tasks, style transfer (an example of controlled generation), and single-sentence classification tasks in the GLUE benchmark, while using fewer parameters than large pretrained models.
References used
https://aclanthology.org/
We probe pre-trained transformer language models for bridging inference. We first investigate individual attention heads in BERT and observe that attention heads at higher layers prominently focus on bridging relations in-comparison with the lower an
Training large language models can consume a large amount of energy. We hypothesize that the language model's configuration impacts its energy consumption, and that there is room for power consumption optimisation in modern large language models. To
The success of language models based on the Transformer architecture appears to be inconsistent with observed anisotropic properties of representations learned by such models. We resolve this by showing, contrary to previous studies, that the represe
Sentence extractive summarization shortens a document by selecting sentences for a summary while preserving its important contents. However, constructing a coherent and informative summary is difficult using a pre-trained BERT-based encoder since it
Abstract We present a language model that combines a large parametric neural network (i.e., a transformer) with a non-parametric episodic memory component in an integrated architecture. Our model uses extended short-term context by caching local hidd