بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

ViCo: Word Embeddings from Visual Co-occurrences

109 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Tanmay Gupta

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Tanmay Gupta - Alexander Schwing - Derek Hoiem

الرؤية الحاسوبية وتمييز الأنماط الحساب واللغة

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We propose to learn word embeddings from visual co-occurrences. Two words co-occur visually if both words apply to the same image or image region. Specifically, we extract four types of visual co-occurrences between object and attribute words from large-scale, textually-annotated visual databases like VisualGenome and ImageNet. We then train a multi-task log-bilinear model that compactly encodes word meanings represented by each co-occurrence type into a single visual word-vector. Through unsupervised clustering, supervised partitioning, and a zero-shot-like generalization analysis we show that our word embeddings complement text-only embeddings like GloVe by better representing similarities and differences between visual concepts that are difficult to obtain from text corpora alone. We further evaluate our embeddings on five downstream applications, four of which are vision-language tasks. Augmenting GloVe with our embeddings yields gains on all tasks. We also find that random embeddings perform comparably to learned embeddings on all supervised vision-language tasks, contrary to conventional wisdom.

قيم البحث

100 - Satwik Kottur , Ramakrishna Vedantam , Jose M. F. Moura 2015

We propose a model to learn visually grounded word embeddings (vis-w2v) to capture visual notions of semantic relatedness. While word embeddings trained using text have been extremely successful, they cannot uncover notions of semantic relatedness im plicit in our visual world. For instance, although eats and stares at seem unrelated in text, they share semantics visually. When people are eating something, they also tend to stare at the food. Grounding diverse relations like eats and stares at into vision remains challenging, despite recent progress in vision. We note that the visual grounding of words depends on semantics, and not the literal pixels. We thus use abstract scenes created from clipart to provide the visual grounding. We find that the embeddings we learn capture fine-grained, visually grounded notions of semantic relatedness. We show improvements over text-only word embeddings (word2vec) on three tasks: common-sense assertion classification, visual paraphrasing and text-based image retrieval. Our code and datasets are available online.

الرؤية الحاسوبية وتمييز الأنماط الحساب واللغة

Composing Knowledge Graph Embeddings via Word Embeddings

290 - Lianbo Ma , Peng Sun , Zhiwei Lin 2019

Learning knowledge graph embedding from an existing knowledge graph is very important to knowledge graph completion. For a fact $(h,r,t)$ with the head entity $h$ having a relation $r$ with the tail entity $t$, the current approaches aim to learn low dimensional representations $(mathbf{h},mathbf{r},mathbf{t})$, each of which corresponds to the elements in $(h, r, t)$, respectively. As $(mathbf{h},mathbf{r},mathbf{t})$ is learned from the existing facts within a knowledge graph, these representations can not be used to detect unknown facts (if the entities or relations never occur in the knowledge graph). This paper proposes a new approach called TransW, aiming to go beyond the current work by composing knowledge graph embeddings using word embeddings. Given the fact that an entity or a relation contains one or more words (quite often), it is sensible to learn a mapping function from word embedding spaces to knowledge embedding spaces, which shows how entities are constructed using human words. More importantly, composing knowledge embeddings using word embeddings makes it possible to deal with the emerging new facts (either new entities or relations). Experimental results using three public datasets show the consistency and outperformance of the proposed TransW.

الذكاء الاصطناعي الحساب واللغة التعلم الآلي

Visual Storytelling via Predicting Anchor Word Embeddings in the Stories

121 - Bowen Zhang , Hexiang Hu , Fei Sha 2020

We propose a learning model for the task of visual storytelling. The main idea is to predict anchor word embeddings from the images and use the embeddings and the image features jointly to generate narrative sentences. We use the embeddings of random ly sampled nouns from the groundtruth stories as the target anchor word embeddings to learn the predictor. To narrate a sequence of images, we use the predicted anchor word embeddings and the image features as the joint input to a seq2seq model. As opposed to state-of-the-art methods, the proposed model is simple in design, easy to optimize, and attains the best results in most automatic evaluation metrics. In human evaluation, the method also outperforms competing methods.

الرؤية الحاسوبية وتمييز الأنماط

Whole-Word Segmental Speech Recognition with Acoustic Word Embeddings

290 - Bowen Shi , Shane Settle , Karen Livescu 2020

Segmental models are sequence prediction models in which scores of hypotheses are based on entire variable-length segments of frames. We consider segmental models for whole-word (acoustic-to-word) speech recognition, with the feature vectors defined using vector embeddings of segments. Such models are computationally challenging as the number of paths is proportional to the vocabulary size, which can be orders of magnitude larger than when using subword units like phones. We describe an efficient approach for end-to-end whole-word segmental models, with forward-backward and Viterbi decoding performed on a GPU and a simple segment scoring function that reduces space complexity. In addition, we investigate the use of pre-training via jointly trained acoustic word embeddings (AWEs) and acoustically grounded word embeddings (AGWEs) of written word labels. We find that word error rate can be reduced by a large margin by pre-training the acoustic segment representation with AWEs, and additional (smaller) gains can be obtained by pre-training the word prediction layer with AGWEs. Our final models improve over prior A2W models.

معالجة الصوت والكلام الحساب واللغة أنظمة الصوت في الحاسوب

A Novel Method of Extracting Topological Features from Word Embeddings

67 - Shafie Gholizadeh , Armin Seyeditabari , Wlodek Zadrozny 2020

In recent years, topological data analysis has been utilized for a wide range of problems to deal with high dimensional noisy data. While text representations are often high dimensional and noisy, there are only a few work on the application of topol ogical data analysis in natural language processing. In this paper, we introduce a novel algorithm to extract topological features from word embedding representation of text that can be used for text classification. Working on word embeddings, topological data analysis can interpret the embedding high-dimensional space and discover the relations among different embedding dimensions. We will use persistent homology, the most commonly tool from topological data analysis, for our experiment. Examining our topological algorithm on long textual documents, we will show our defined topological features may outperform conventional text mining features.

التعلم الآلي الحساب واللغة الطوبولوجيا الجبرية

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الأندلس للعلوم الطبية

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

ViCo: Word Embeddings from Visual Co-occurrences

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً