Gaussian Mixture Embeddings for Multiple Word Prototypes

82 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Xinchi Chen

تاريخ النشر 2015

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Xinchi Chen - Xipeng Qiu - Jingxiang Jiang

الحساب واللغة

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Recently, word representation has been increasingly focused on for its excellent properties in representing the word semantics. Previous works mainly suffer from the problem of polysemy phenomenon. To address this problem, most of previous models represent words as multiple distributed vectors. However, it cannot reflect the rich relations between words by representing words as points in the embedded space. In this paper, we propose the Gaussian mixture skip-gram (GMSG) model to learn the Gaussian mixture embeddings for words based on skip-gram framework. Each word can be regarded as a gaussian mixture distribution in the embedded space, and each gaussian component represents a word sense. Since the number of senses varies from word to word, we further propose the Dynamic GMSG (D-GMSG) model by adaptively increasing the sense number of words during training. Experiments on four benchmarks show the effectiveness of our proposed model.

قيم البحث

114 - Jingkang Wang , Jianing Zhou , Jie Zhou 2018

Chinese word segmentation (CWS) is often regarded as a character-based sequence labeling task in most current works which have achieved great success with the help of powerful neural networks. However, these works neglect an important clue: Chinese c haracters incorporate both semantic and phonetic meanings. In this paper, we introduce multiple character embeddings including Pinyin Romanization and Wubi Input, both of which are easily accessible and effective in depicting semantics of characters. We propose a novel shared Bi-LSTM-CRF model to fuse linguistic features efficiently by sharing the LSTM network during the training procedure. Extensive experiments on five corpora show that extra embeddings help obtain a significant improvement in labeling accuracy. Specifically, we achieve the state-of-the-art performance in AS and CityU corpora with F1 scores of 96.9 and 97.3, respectively without leveraging any external lexical resources.

الحساب واللغة

Evaluating Neural Word Embeddings for Sanskrit

68 - Jivnesh Sandhan , Om Adideva , Digumarthi Komal 2021

Recently, the supervised learning paradigms surprisingly remarkable performance has garnered considerable attention from Sanskrit Computational Linguists. As a result, the Sanskrit community has put laudable efforts to build task-specific labeled dat a for various downstream Natural Language Processing (NLP) tasks. The primary component of these approaches comes from representations of word embeddings. Word embedding helps to transfer knowledge learned from readily available unlabelled data for improving task-specific performance in low-resource setting. Last decade, there has been much excitement in the field of digitization of Sanskrit. To effectively use such readily available resources, it is very much essential to perform a systematic study on word embedding approaches for the Sanskrit language. In this work, we investigate the effectiveness of word embeddings. We classify word embeddings in broad categories to facilitate systematic experimentation and evaluate them on four intrinsic tasks. We investigate the efficacy of embeddings approaches (originally proposed for languages other than Sanskrit) for Sanskrit along with various challenges posed by language.

الحساب واللغة

Word Embeddings: A Survey

160 - Felipe Almeida , Geraldo Xexeo 2019

This work lists and describes the main recent strategies for building fixed-length, dense and distributed representations for words, based on the distributional hypothesis. These representations are now commonly called word embeddings and, in additio n to encoding surprisingly good syntactic and semantic information, have been proven useful as extra features in many downstream NLP tasks.

الحساب واللغة التعلم الآلي التعلم الالي

Compositional Demographic Word Embeddings

280 - Charles Welch , Jonathan K. Kummerfeld , Veronica Perez-Rosas 2020

Word embeddings are usually derived from corpora containing text from many individuals, thus leading to general purpose representations rather than individually personalized representations. While personalized embeddings can be useful to improve lang uage model performance and other language processing tasks, they can only be computed for people with a large amount of longitudinal data, which is not the case for new users. We propose a new form of personalized word embeddings that use demographic-specific word representations derived compositionally from full or partial demographic information for a user (i.e., gender, age, location, religion). We show that the resulting demographic-aware word representations outperform generic word representations on two tasks for English: language modeling and word associations. We further explore the trade-off between the number of available attributes and their relative effectiveness and discuss the ethical implications of using them.

الحساب واللغة الذكاء الاصطناعي التعلم الآلي

Dynamic Word Embeddings for Evolving Semantic Discovery

305 - Zijun Yao , Yifan Sun , Weicong Ding 2017

Word evolution refers to the changing meanings and associations of words throughout time, as a byproduct of human language evolution. By studying word evolution, we can infer social trends and language constructs over different periods of human histo ry. However, traditional techniques such as word representation learning do not adequately capture the evolving language structure and vocabulary. In this paper, we develop a dynamic statistical model to learn time-aware word vector representation. We propose a model that simultaneously learns time-aware embeddings and solves the resulting alignment problem. This model is trained on a crawled NYTimes dataset. Additionally, we develop multiple intuitive evaluation strategies of temporal word embeddings. Our qualitative and quantitative tests indicate that our method not only reliably captures this evolution over time, but also consistently outperforms state-of-the-art temporal embedding approaches on both semantic accuracy and alignment quality.

الحساب واللغة التعلم الالي

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الحواش الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Gaussian Mixture Embeddings for Multiple Word Prototypes

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً