Do you want to publish a course? Click here

Decomposing Word Embedding with the Capsule Network

259   0   0.0 ( 0 )
 Added by Xin Liu
 Publication date 2020
and research's language is English




Ask ChatGPT about the research

Word sense disambiguation tries to learn the appropriate sense of an ambiguous word in a given context. The existing pre-trained language methods and the methods based on multi-embeddings of word did not explore the power of the unsupervised word embedding sufficiently. In this paper, we discuss a capsule network-based approach, taking advantage of capsules potential for recognizing highly overlapping features and dealing with segmentation. We propose a Capsule network-based method to Decompose the unsupervised word Embedding of an ambiguous word into context specific Sense embedding, called CapsDecE2S. In this approach, the unsupervised ambiguous embedding is fed into capsule network to produce its multiple morpheme-like vectors, which are defined as the basic semantic language units of meaning. With attention operations, CapsDecE2S integrates the word context to reconstruct the multiple morpheme-like vectors into the context-specific sense embedding. To train CapsDecE2S, we propose a sense matching training method. In this method, we convert the sense learning into a binary classification that explicitly learns the relation between senses by the label of matching and non-matching. The CapsDecE2S was experimentally evaluated on two sense learning tasks, i.e., word in context and word sense disambiguation. Results on two public corpora Word-in-Context and English all-words Word Sense Disambiguation show that, the CapsDecE2S model achieves the new state-of-the-art for the word in context and word sense disambiguation tasks.



rate research

Read More

Word embeddings are reliable feature representations of words used to obtain high quality results for various NLP applications. Uncontextualized word embeddings are used in many NLP tasks today, especially in resource-limited settings where high memory capacity and GPUs are not available. Given the historical success of word embeddings in NLP, we propose a retrospective on some of the most well-known word embedding algorithms. In this work, we deconstruct Word2vec, GloVe, and others, into a common form, unveiling some of the common conditions that seem to be required for making performant word embeddings. We believe that the theoretical findings in this paper can provide a basis for more informed development of future models.
323 - Abhik Jana , Pawan Goyal 2018
Distributed representations of words learned from text have proved to be successful in various natural language processing tasks in recent times. While some methods represent words as vectors computed from text using predictive model (Word2vec) or dense count based model (GloVe), others attempt to represent these in a distributional thesaurus network structure where the neighborhood of a word is a set of words having adequate context overlap. Being motivated by recent surge of research in network embedding techniques (DeepWalk, LINE, node2vec etc.), we turn a distributional thesaurus network into dense word vectors and investigate the usefulness of distributional thesaurus embedding in improving overall word representation. This is the first attempt where we show that combining the proposed word representation obtained by distributional thesaurus embedding with the state-of-the-art word representations helps in improving the performance by a significant margin when evaluated against NLP tasks like word similarity and relatedness, synonym detection, analogy detection. Additionally, we show that even without using any handcrafted lexical resources we can come up with representations having comparable performance in the word similarity and relatedness tasks compared to the representations where a lexical resource has been used.
To disclose overlapped multiple relations from a sentence still keeps challenging. Most current works in terms of neural models inconveniently assuming that each sentence is explicitly mapped to a relation label, cannot handle multiple relations properly as the overlapped features of the relations are either ignored or very difficult to identify. To tackle with the new issue, we propose a novel approach for multi-labeled relation extraction with capsule network which acts considerably better than current convolutional or recurrent net in identifying the highly overlapped relations within an individual sentence. To better cluster the features and precisely extract the relations, we further devise attention-based routing algorithm and sliding-margin loss function, and embed them into our capsule network. The experimental results show that the proposed approach can indeed extract the highly overlapped features and achieve significant performance improvement for relation extraction comparing to the state-of-the-art works.
128 - Lingfei Wu , Ian E.H. Yen , Kun Xu 2018
While the celebrated Word2Vec technique yields semantically rich representations for individual words, there has been relatively less success in extending to generate unsupervised sentences or documents embeddings. Recent work has demonstrated that a distance measure between documents called emph{Word Movers Distance} (WMD) that aligns semantically similar words, yields unprecedented KNN classification accuracy. However, WMD is expensive to compute, and it is hard to extend its use beyond a KNN classifier. In this paper, we propose the emph{Word Movers Embedding } (WME), a novel approach to building an unsupervised document (sentence) embedding from pre-trained word embeddings. In our experiments on 9 benchmark text classification datasets and 22 textual similarity tasks, the proposed technique consistently matches or outperforms state-of-the-art techniques, with significantly higher accuracy on problems of short length.
A challenging task for word embeddings is to capture the emergent meaning or polarity of a combination of individual words. For example, existing approaches in word embeddings will assign high probabilities to the words Penguin and Fly if they frequently co-occur, but it fails to capture the fact that they occur in an opposite sense - Penguins do not fly. We hypothesize that humans do not associate a single polarity or sentiment to each word. The word contributes to the overall polarity of a combination of words depending upon which other words it is combined with. This is analogous to the behavior of microscopic particles which exist in all possible states at the same time and interfere with each other to give rise to new states depending upon their relative phases. We make use of the Hilbert Space representation of such particles in Quantum Mechanics where we subscribe a relative phase to each word, which is a complex number, and investigate two such quantum inspired models to derive the meaning of a combination of words. The proposed models achieve better performances than state-of-the-art non-quantum models on the binary sentence classification task.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا