Research papers, master and doctoral theses about text representations

Robust Open-Vocabulary Translation from Visual Text Representations

154 - Association for Computation Linguistics 2021 مقالة

Machine translation models have discrete vocabularies and commonly use subword segmentation techniques to achieve an open vocabulary.' This approach relies on consistent and correct underlying unicode sequences, and makes models susceptible to degrad ation from common types of noise and variation. Motivated by the robustness of human language processing, we propose the use of visual text representations, which dispense with a finite set of text embeddings in favor of continuous vocabularies created by processing visually rendered text with sliding windows. We show that models using visual text representations approach or match performance of traditional text models on small and larger datasets. More importantly, models with visual embeddings demonstrate significant robustness to varied types of noise, achieving e.g., 25.9 BLEU on a character permuted German--English task where subword models degrade to 1.9.

robust open-vocabulary translation visual text representations robust open-vocabulary ترجمة متفوعة قوية تمثيل النص المرئي قوية المتفردات صناعة حمض الفوسفور المزيد..

Comparing Text Representations: A Theory-Driven Approach

250 - Association for Computation Linguistics 2021 مقالة

Much of the progress in contemporary NLP has come from learning representations, such as masked language model (MLM) contextual embeddings, that turn challenging problems into simple classification tasks. But how do we quantify and explain this effec t? We adapt general tools from computational learning theory to fit the specific characteristics of text datasets and present a method to evaluate the compatibility between representations and tasks. Even though many tasks can be easily solved with simple bag-of-words (BOW) representations, BOW does poorly on hard natural language inference tasks. For one such task we find that BOW cannot distinguish between real and randomized labelings, while pre-trained MLM representations show 72x greater distinction between real and random labelings than BOW. This method provides a calibrated, quantitative measure of the difficulty of a classification-based NLP task, enabling comparisons between representations without requiring empirical evaluations that may be sensitive to initializations and hyperparameters. The method provides a fresh perspective on the patterns in a dataset and the alignment of those patterns with specific labels.

theory-driven approach comparing text representations comparing text النهج النظرية مقارنة تمثيلات النص مقارنة النص صناعة حمض الفوسفور المزيد..

Simultaneously Self-Attending to Text and Entities for Knowledge-Informed Text Representations

49 - Association for Computation Linguistics 2021 مقالة

Pre-trained language models have emerged as highly successful methods for learning good text representations. However, the amount of structured knowledge retained in such models, and how (if at all) it can be extracted, remains an open question. In t his work, we aim at directly learning text representations which leverage structured knowledge about entities mentioned in the text. This can be particularly beneficial for downstream tasks which are knowledge-intensive. Our approach utilizes self-attention between words in the text and knowledge graph (KG) entities mentioned in the text. While existing methods require entity-linked data for pre-training, we train using a mention-span masking objective and a candidate ranking objective -- which doesn't require any entity-links and only assumes access to an alias table for retrieving candidates, enabling large-scale pre-training. We show that the proposed model learns knowledge-informed text representations that yield improvements on the downstream tasks over existing methods.

text representations knowledge-informed text representations تمثيل النص تمثيلات نصية المعرفة صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد