New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Unsupervised Representation Disentanglement of Text: An Evaluation on Synthetic Datasets

تمثيل تمثيل غير مؤظفي النص: تقييم في مجموعات البيانات الاصطناعية

549 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

unsupervised representation disentanglement representation disentanglement synthetic datasets devent الانسحاب غير المدعوم تمثيل disentanglement مجموعات البيانات الاصطناعية صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

To highlight the challenges of achieving representation disentanglement for text domain in an unsupervised setting, in this paper we select a representative set of successfully applied models from the image domain. We evaluate these models on 6 disentanglement metrics, as well as on downstream classification tasks and homotopy. To facilitate the evaluation, we propose two synthetic datasets with known generative factors. Our experiments highlight the existing gap in the text domain and illustrate that certain elements such as representation sparsity (as an inductive bias), or representation coupling with the decoder could impact disentanglement. To the best of our knowledge, our work is the first attempt on the intersection of unsupervised representation disentanglement and text, and provides the experimental framework and datasets for examining future developments in this direction.

References used

https://aclanthology.org/

rate research

Unsupervised Contextualized Document Representation

352 - Association for Computation Linguistics 2021 مقالة

Several NLP tasks need the effective repre-sentation of text documents.Arora et al.,2017 demonstrate that simple weighted aver-aging of word vectors frequently outperformsneural models. SCDV (Mekala et al., 2017)further extends this from sentences to docu-ments by employing soft and sparse cluster-ing over pre-computed word vectors. How-ever, both techniques ignore the polysemyand contextual character of words.In thispaper, we address this issue by proposingSCDV+BERT(ctxd), a simple and effective un-supervised representation that combines con-textualized BERT (Devlin et al., 2019) basedword embedding for word sense disambigua-tion with SCDV soft clustering approach. Weshow that our embeddings outperform origi-nal SCDV, pre-train BERT, and several otherbaselines on many classification datasets. Wealso demonstrate our embeddings effective-ness on other tasks, such as concept match-ing and sentence similarity.In addition,we show that SCDV+BERT(ctxd) outperformsfine-tune BERT and different embedding ap-proaches in scenarios with limited data andonly few shots examples.

unsupervised contextualized document contextualized document representation contextualized document وثيقة السياق غير المدعومة تمثيل المستندات السياقية وثيقة السياق صناعة حمض الفوسفور المزيد..

Evaluating Text Generation from Discourse Representation Structures

416 - Association for Computation Linguistics 2021 مقالة

We present an end-to-end neural approach to generate English sentences from formal meaning representations, Discourse Representation Structures (DRSs). We use a rather standard bi-LSTM sequence-to-sequence model, work with a linearized DRS input repr esentation, and evaluate character-level and word-level decoders. We obtain very encouraging results in terms of reference-based automatic metrics such as BLEU. But because such metrics only evaluate the surface level of generated output, we develop a new metric, ROSE, that targets specific semantic phenomena. We do this with five DRS generation challenge sets focusing on tense, grammatical number, polarity, named entities and quantities. The aim of these challenge sets is to assess the neural generator's systematicity and generalization to unseen inputs.

الأصالة في الطبيعية evaluating text generation تقييم جيل النص صناعة حمض الفوسفور

Text Generation from Discourse Representation Structures

438 - Association for Computation Linguistics 2021 مقالة

We propose neural models to generate text from formal meaning representations based on Discourse Representation Structures (DRSs). DRSs are document-level representations which encode rich semantic detail pertaining to rhetorical relations, presuppos ition, and co-reference within and across sentences. We formalize the task of neural DRS-to-text generation and provide modeling solutions for the problems of condition ordering and variable naming which render generation from DRSs non-trivial. Our generator relies on a novel sibling treeLSTM model which is able to accurately represent DRS structures and is more generally suited to trees with wide branches. We achieve competitive performance (59.48 BLEU) on the GMB benchmark against several strong baselines.

discourse representation structures discourse representation representation structures خطاب التمثيل هياكل تمثيل الخطاب هياكل التمثيل صناعة حمض الفوسفور المزيد..

Word Representation Learning in Multimodal Pre-Trained Transformers: An Intrinsic Evaluation

279 - Association for Computation Linguistics 2021 مقالة

Abstract This study carries out a systematic intrinsic evaluation of the semantic representations learned by state-of-the-art pre-trained multimodal Transformers. These representations are claimed to be task-agnostic and shown to help on many downstr eam language-and-vision tasks. However, the extent to which they align with human semantic intuitions remains unclear. We experiment with various models and obtain static word representations from the contextualized ones they learn. We then evaluate them against the semantic judgments provided by human speakers. In line with previous evidence, we observe a generalized advantage of multimodal representations over language- only ones on concrete word pairs, but not on abstract ones. On the one hand, this confirms the effectiveness of these models to align language and vision, which results in better semantic representations for concepts that are grounded in images. On the other hand, models are shown to follow different representation learning patterns, which sheds some light on how and when they perform multimodal integration.

انخفاض المعجمات systematic intrinsic evaluation multimodal pre-trained transformers التقييم الجوهري المنهجي محولات متعددة المتدرب مسبقا صناعة حمض الفوسفور

Contextualizing Variation in Text Style Transfer Datasets

423 - Association for Computation Linguistics 2021 مقالة

Text style transfer involves rewriting the content of a source sentence in a target style. Despite there being a number of style tasks with available data, there has been limited systematic discussion of how text style datasets relate to each other. This understanding, however, is likely to have implications for selecting multiple data sources for model training. While it is prudent to consider inherent stylistic properties when determining these relationships, we also must consider how a style is realized in a particular dataset. In this paper, we conduct several empirical analyses of existing text style datasets. Based on our results, we propose a categorization of stylistic and dataset properties to consider when utilizing or comparing text style datasets.

جيل يسمى الياقوت contextualizing variation text style تباين السياق نمط النص صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Unsupervised Representation Disentanglement of Text: An Evaluation on Synthetic Datasets

تمثيل تمثيل غير مؤظفي النص: تقييم في مجموعات البيانات الاصطناعية

Ask ChatGPT about the research

Read More

suggested questions