New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Retrieval, Analogy, and Composition: A framework for Compositional Generalization in Image Captioning

استرجاع، تشبيه، وتكوين: إطار للتعميم التركيبي في تقسيم الصور

432 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Image captioning systems are expected to have the ability to combine individual concepts when describing scenes with concept combinations that are not observed during training. In spite of significant progress in image captioning with the help of the autoregressive generation framework, current approaches fail to generalize well to novel concept combinations. We propose a new framework that revolves around probing several similar image caption training instances (retrieval), performing analogical reasoning over relevant entities in retrieved prototypes (analogy), and enhancing the generation process with reasoning outcomes (composition). Our method augments the generation model by referring to the neighboring instances in the training set to produce novel concept combinations in generated captions. We perform experiments on the widely used image captioning benchmarks. The proposed models achieve substantial improvement over the compared baselines on both composition-related evaluation metrics and conventional image captioning metrics.

References used

https://aclanthology.org/

rate research

Exploiting Image--Text Synergy for Contextual Image Captioning

704 - Association for Computation Linguistics 2021 مقالة

Modern web content - news articles, blog posts, educational resources, marketing brochures - is predominantly multimodal. A notable trait is the inclusion of media such as images placed at meaningful locations within a textual narrative. Most often, such images are accompanied by captions - either factual or stylistic (humorous, metaphorical, etc.) - making the narrative more engaging to the reader. While standalone image captioning has been extensively studied, captioning an image based on external knowledge such as its surrounding text remains under-explored. In this paper, we study this new task: given an image and an associated unstructured knowledge snippet, the goal is to generate a contextual caption for the image.

text synergy synergy for contextual نص التآزر التآزر إلى السياق صناعة حمض الفوسفور

Content based image retrieval cbir

5814 - Tishreen University 2013 مشروع تخرج

هدفنا من خلال هذه الدراسة في إطار المشروع الفصلي للسنة الرابعة إلى إلقاء الضوء على استرجاع الصور من مجموعة كبيرة بالاعتماد على محتوى صورة هدف , و قمنا بتدعيم هذه الدراسة بتطبيق ضمن بيئة الماتلاب لبرنامج بحث عن الصور المشابهة لصورة مدخلة . و قد تركز بحثنا على ميزتين هامتين يكاد لا يخلو منها أي نظام بحث عن الصور بالاعتماد على المحتوى و هما ميزتي الهيستوغرام اللوني و بنية الصورة texture , ووضحنا الخطوات التي يتم في ضوئها عملية الاسترجاع بدءاً من تحليل الصورة و استخلاص شعاع الواصفات الخاص فيها , و مطابقته مع أشعة الميزات الخاصة بالصور الموجودة في قاعدة البيانات ليتم ترتيب الصور بحسب مدى تشابهها من الصورة الهدف . و تطرقت الدراسة إلى استخدام الفضاء اللوني HMMD كبديل للفضاء اللوني RGB لاستخراج واصفات البنية اللونية على اعتبار أنه نموذج لوني موجه بالمستخدم user oriented و بالتالي نضمن أن نحصل على نتائج أفضل ترضي المستخدم . وقمنا بتدعيم الدراسة بعدد من الأشكال و الأمثلة و المخططات التي توضح محتوى الدراسة النظرية و ما قمنا بعمله في التطبيق ضمن بيئة الماتلاب .

facial characteristic points-FCP Information retrieval Cbir Features extraction معالجة صورة استرجاع معلومات محرك بحث عن الصور المزيد..

Compositional Generalization via Semantic Tagging

555 - Association for Computation Linguistics 2021 مقالة

Although neural sequence-to-sequence models have been successfully applied to semantic parsing, they fail at compositional generalization, i.e., they are unable to systematically generalize to unseen compositions of seen components. Motivated by trad itional semantic parsing where compositionality is explicitly accounted for by symbolic grammars, we propose a new decoding framework that preserves the expressivity and generality of sequence-to-sequence models while featuring lexicon-style alignments and disentangled information processing. Specifically, we decompose decoding into two phases where an input utterance is first tagged with semantic symbols representing the meaning of individual words, and then a sequence-to-sequence model is used to predict the final meaning representation conditioning on the utterance and the predicted tag sequence. Experimental results on three semantic parsing datasets show that the proposed approach consistently improves compositional generalization across model architectures, domains, and semantic formalisms.

compositional generalization semantic tagging التعميم التركيبي العلامة الدلالية صناعة حمض الفوسفور

Analysis study of Content Based Medical Image Retrieval Systems

1863 - Damascus University 2012 ورقة بحثية

Content Based Medical Image Retrieval (CBMIR) systems are a new technique which researchers aim to integrate with Computer Aided Diagnosis systems. These systems usually find and retrieve images from a large image-database which have a similar conten t to a query image. Retrieval is done by extracting the visual features from the query image, formulating them in a features vector, comparing features vector components with those of the images in the database, and then, similarity measures are computed. Based on the similarity measures, images which have a similar content to the query image are retrieved. The introduced analysis study surveys and analyzes the current status of the CBMIR systems, evaluates our findings from this survey, and concludes some specific research directions in this field.

أنظمة استرجاع الصور الطبية اعتماداً على المحتوى استخلاص الخصائص البصرية متجه الخصائص قياس التشابه Content Based Medical Image Retrieval visual feature extraction features vector similarity measure المزيد..

Compositional Generalization for Neural Semantic Parsing via Span-level Supervised Attention

658 - Association for Computation Linguistics 2021 مقالة

We describe a span-level supervised attention loss that improves compositional generalization in semantic parsers. Our approach builds on existing losses that encourage attention maps in neural sequence-to-sequence models to imitate the output of cla ssical word alignment algorithms. Where past work has used word-level alignments, we focus on spans; borrowing ideas from phrase-based machine translation, we align subtrees in semantic parses to spans of input sentences, and encourage neural attention mechanisms to mimic these alignments. This method improves the performance of transformers, RNNs, and structured decoders on three benchmarks of compositional generalization.

span-level supervised attention neural semantic parsing span-level supervised إشراف على مستوى الإشراف تحليل الدلالي العصبي تم الإشراف على مستوى صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Retrieval, Analogy, and Composition: A framework for Compositional Generalization in Image Captioning

استرجاع، تشبيه، وتكوين: إطار للتعميم التركيبي في تقسيم الصور

Ask ChatGPT about the research

Read More

suggested questions