Learning Unsupervised Word Mapping by Maximizing Mean Discrepancy

241 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Pengcheng Yang

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Pengcheng Yang - Fuli Luo - Shuangzhi Wu

الحساب واللغة

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Cross-lingual word embeddings aim to capture common linguistic regularities of different languages, which benefit various downstream tasks ranging from machine translation to transfer learning. Recently, it has been shown that these embeddings can be effectively learned by aligning two disjoint monolingual vector spaces through a linear transformation (word mapping). In this work, we focus on learning such a word mapping without any supervision signal. Most previous work of this task adopts parametric metrics to measure distribution differences, which typically requires a sophisticated alternate optimization process, either in the form of emph{minmax game} or intermediate emph{density estimation}. This alternate optimization process is relatively hard and unstable. In order to avoid such sophisticated alternate optimization, we propose to learn unsupervised word mapping by directly maximizing the mean discrepancy between the distribution of transferred embedding and target embedding. Extensive experimental results show that our proposed model outperforms competitive baselines by a large margin.

قيم البحث

105 - Antonio H. O. Fonseca , David van Dijk 2020

Word translation is an integral part of language translation. In machine translation, each language is considered a domain with its own word embedding. The alignment between word embeddings allows linking semantically equivalent words in multilingual contexts. Moreover, it offers a way to infer cross-lingual meaning for words without a direct translation. Current methods for word embedding alignment are either supervised, i.e. they require known word pairs, or learn a cross-domain transformation on fixed embeddings in an unsupervised way. Here we propose an end-to-end approach for word embedding alignment that does not require known word pairs. Our method, termed Word Alignment through MMD (WAM), learns embeddings that are aligned during sentence translation training using a localized Maximum Mean Discrepancy (MMD) constraint between the embeddings. We show that our method not only out-performs unsupervised methods, but also supervised methods that train on known word translations.

الحساب واللغة التعلم الآلي

Unsupervised Learning of Style-sensitive Word Vectors

68 - Reina Akama , Kento Watanabe , Sho Yokoi 2018

This paper presents the first study aimed at capturing stylistic similarity between words in an unsupervised manner. We propose extending the continuous bag of words (CBOW) model (Mikolov et al., 2013) to learn style-sensitive word vectors using a wi der context window under the assumption that the style of all the words in an utterance is consistent. In addition, we introduce a novel task to predict lexical stylistic similarity and to create a benchmark dataset for this task. Our experiment with this dataset supports our assumption and demonstrates that the proposed extensions contribute to the acquisition of style-sensitive word embeddings.

الحساب واللغة

Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion

89 - Armand Joulin , Piotr Bojanowski , Tomas Mikolov 2018

Continuous word representations learned separately on distinct languages can be aligned so that their words become comparable in a common space. Existing works typically solve a least-square regression problem to learn a rotation aligning a small bil ingual lexicon, and use a retrieval criterion for inference. In this paper, we propose an unified formulation that directly optimizes a retrieval criterion in an end-to-end fashion. Our experiments on standard benchmarks show that our approach outperforms the state of the art on word translation, with the biggest improvements observed for distant language pairs such as English-Chinese.

الحساب واللغة التعلم الآلي

A Hybrid Approach to Word Sense Disambiguation Combining Supervised and Unsupervised Learning

88 - Alok Ranjan Pal , Anirban Kundu , Abhay Singh 2015

In this paper, we are going to find meaning of words based on distinct situations. Word Sense Disambiguation is used to find meaning of words based on live contexts using supervised and unsupervised approaches. Unsupervised approaches use online dict ionary for learning, and supervised approaches use manual learning sets. Hand tagged data are populated which might not be effective and sufficient for learning procedure. This limitation of information is main flaw of the supervised approach. Our proposed approach focuses to overcome the limitation using learning set which is enriched in dynamic way maintaining new data. Trivial filtering method is utilized to achieve appropriate training data. We introduce a mixed methodology having Modified Lesk approach and Bag-of-Words having enriched bags using learning methods. Our approach establishes the superiority over individual Modified Lesk and Bag-of-Words approaches based on experimentation.

الحساب واللغة

Unsupervised Text Generation by Learning from Search

140 - Jingjing Li , Zichao Li , Lili Mou 2020

In this work, we present TGLS, a novel framework to unsupervised Text Generation by Learning from Search. We start by applying a strong search algorithm (in particular, simulated annealing) towards a heuristically defined objective that (roughly) est imates the quality of sentences. Then, a conditional generative model learns from the search results, and meanwhile smooth out the noise of search. The alternation between search and learning can be repeated for performance bootstrapping. We demonstrate the effectiveness of TGLS on two real-world natural language generation tasks, paraphrase generation and text formalization. Our model significantly outperforms unsupervised baseline methods in both tasks. Especially, it achieves comparable performance with the state-of-the-art supervised methods in paraphrase generation.

الحساب واللغة الذكاء الاصطناعي استرجاع المعلومات

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الشرق الأوسط - الأردن

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Learning Unsupervised Word Mapping by Maximizing Mean Discrepancy

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً