Testing the limits of unsupervised learning for semantic similarity

69 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Muktabh Mayank Srivastava

تاريخ النشر 2017

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Richa Sharma - Muktabh Mayank Srivastava

الحساب واللغة

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Semantic Similarity between two sentences can be defined as a way to determine how related or unrelated two sentences are. The task of Semantic Similarity in terms of distributed representations can be thought to be generating sentence embeddings (dense vectors) which take both context and meaning of sentence in account. Such embeddings can be produced by multiple methods, in this paper we try to evaluate LSTM auto encoders for generating these embeddings. Unsupervised algorithms (auto encoders to be specific) just try to recreate their inputs, but they can be forced to learn order (and some inherent meaning to some extent) by creating proper bottlenecks. We try to evaluate how properly can algorithms trained just on plain English Sentences learn to figure out Semantic Similarity, without giving them any sense of what meaning of a sentence is.

قيم البحث

125 - Nina Poerner , Ulli Waltinger , Hinrich Schutze 2019

We address the task of unsupervised Semantic Textual Similarity (STS) by ensembling diverse pre-trained sentence encoders into sentence meta-embeddings. We apply, extend and evaluate different meta-embedding methods from the word embedding literature at the sentence level, including dimensionality reduction (Yin and Schutze, 2016), generalized Canonical Correlation Analysis (Rastogi et al., 2015) and cross-view auto-encoders (Bollegala and Bao, 2018). Our sentence meta-embeddings set a new unsupervised State of The Art (SoTA) on the STS Benchmark and on the STS12-STS16 datasets, with gains of between 3.7% and 6.4% Pearsons r over single-source systems.

الحساب واللغة

Learning Semantic Textual Similarity from Conversations

183 - Yinfei Yang , Steve Yuan , Daniel Cer 2018

We present a novel approach to learn representations for sentence-level semantic similarity using conversational data. Our method trains an unsupervised model to predict conversational input-response pairs. The resulting sentence embeddings perform w ell on the semantic textual similarity (STS) benchmark and SemEval 2017s Community Question Answering (CQA) question similarity subtask. Performance is further improved by introducing multitask training combining the conversational input-response prediction task and a natural language inference task. Extensive experiments show the proposed model achieves the best performance among all neural models on the STS benchmark and is competitive with the state-of-the-art feature engineered and mixed systems in both tasks.

الحساب واللغة

R-grams: Unsupervised Learning of Semantic Units in Natural Language

71 - Ariel Ekgren , Amaru Cuba Gyllensten , Magnus Sahlgren 2018

This paper investigates data-driven segmentation using Re-Pair or Byte Pair Encoding-techniques. In contrast to previous work which has primarily been focused on subword units for machine translation, we are interested in the general properties of su ch segments above the word level. We call these segments r-grams, and discuss their properties and the effect they have on the token frequency distribution. The proposed approach is evaluated by demonstrating its viability in embedding techniques, both in monolingual and multilingual test settings. We also provide a number of qualitative examples of the proposed methodology, demonstrating its viability as a language-invariant segmentation procedure.

الحساب واللغة التعلم الآلي

Evolution of Semantic Similarity -- A Survey

69 - Dhivya Chandrasekaran , Vijay Mago 2020

Estimating the semantic similarity between text data is one of the challenging and open research problems in the field of Natural Language Processing (NLP). The versatility of natural language makes it difficult to define rule-based methods for deter mining semantic similarity measures. In order to address this issue, various semantic similarity methods have been proposed over the years. This survey article traces the evolution of such methods, categorizing them based on their underlying principles as knowledge-based, corpus-based, deep neural network-based methods, and hybrid methods. Discussing the strengths and weaknesses of each method, this survey provides a comprehensive view of existing systems in place, for new researchers to experiment and develop innovative ideas to address the issue of semantic similarity.

الحساب واللغة استرجاع المعلومات

154 - Ziwei Wang , Yunsong Wang , Ziyi Wu 2021

In this paper, we propose an instance similarity learning (ISL) method for unsupervised feature representation. Conventional methods assign close instance pairs in the feature space with high similarity, which usually leads to wrong pairwise relation ship for large neighborhoods because the Euclidean distance fails to depict the true semantic similarity on the feature manifold. On the contrary, our method mines the feature manifold in an unsupervised manner, through which the semantic similarity among instances is learned in order to obtain discriminative representations. Specifically, we employ the Generative Adversarial Networks (GAN) to mine the underlying feature manifold, where the generated features are applied as the proxies to progressively explore the feature manifold so that the semantic similarity among instances is acquired as reliable pseudo supervision. Extensive experiments on image classification demonstrate the superiority of our method compared with the state-of-the-art methods. The code is available at https://github.com/ZiweiWangTHU/ISL.git.

الرؤية الحاسوبية وتمييز الأنماط

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

المعهد الوطني الجزائري للبحث الزراعي

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Testing the limits of unsupervised learning for semantic similarity

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً