Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Image Captioning

توليد توصيف نصي للصور

1509 0 121 0 ( 0 )

Download Cite

Added by Tishreen University مشروع تخرج

Publication date 2018

fields Informatics Engineering

and research's language is العربية

Authors آدم عضيمه( طالب )

Created by adam oudaimah

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

No English abstract

Artificial intelligence review:

Upgrade your account to view the content

Research summary

تتناول هذه الورقة البحثية تطوير نموذج جديد قائم على التركيز (attention-based model) لتوليد توصيف نصي للصور. يتم تدريب النموذج باستخدام تقنيات التراجع الخلفي (backpropagation) وتعظيم حد سفلي متغير عشوائيًا (variational lower bound). يتم استخدام قاعدة بيانات MS COCO لتدريب النموذج. تعتمد الورقة على استخدام الشبكات العصبونية الملتفة (CNN) لاستخلاص تمثيلات شعاعية للصور والشبكات العصبونية التكرارية (RNN) لتوليد التوصيف النصي. يتم التركيز على أهمية التركيز في أنظمة الرؤية البشرية وكيف يمكن للنموذج تصحيح الأخطاء عند توليد كلمات غير متوافقة مع الكائنات الموجودة في الصورة. يتم شرح معمارية الشبكات العصبونية الملتفة والتكرارية بالتفصيل، بالإضافة إلى كيفية تدريب النموذج باستخدام مكتبة TensorFlow. يتم تقديم نتائج التدريب على 10000 صورة من قاعدة بيانات MS COCO، حيث بلغت نسبة الدقة حوالي 70%.

Critical review

تعد الورقة البحثية مساهمة قيمة في مجال توليد التوصيف النصي للصور باستخدام نماذج التركيز. ومع ذلك، يمكن تحسينها من خلال تقديم تحليل أعمق لأداء النموذج على مجموعات بيانات مختلفة وتقديم مقارنة مع نماذج أخرى مشابهة. كما يمكن تحسين الورقة من خلال تقديم تفاصيل أكثر حول كيفية تحسين النموذج للتعامل مع الصور ذات التعقيد العالي. بالإضافة إلى ذلك، يمكن تحسين الورقة من خلال تقديم تحليل أعمق للأخطاء التي يرتكبها النموذج وكيفية تصحيحها.

Questions related to the research

ما هي التقنية المستخدمة لتدريب النموذج في الورقة البحثية؟

يتم تدريب النموذج باستخدام تقنيات التراجع الخلفي (backpropagation) وتعظيم حد سفلي متغير عشوائيًا (variational lower bound).
ما هي قاعدة البيانات المستخدمة لتدريب النموذج؟

تم استخدام قاعدة بيانات MS COCO لتدريب النموذج.
ما هي نسبة الدقة التي حققها النموذج المدرب على قاعدة بيانات MS COCO؟

بلغت نسبة الدقة حوالي 70%.
ما هي الشبكات العصبونية المستخدمة في النموذج لتوليد التوصيف النصي؟

تم استخدام الشبكات العصبونية الملتفة (CNN) لاستخلاص تمثيلات شعاعية للصور والشبكات العصبونية التكرارية (RNN) لتوليد التوصيف النصي.

Keywords

ترجمة الآلة تمييز العناصر نموذج التركيز الشبكات العصبونية الملتفة الشبكات العصبونية التكرارية توليد التوصيف النصي قاعدة بيانات MS COCO

References used

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. Kelvin Xu. 2016

A Critical Review of Recurrent Neural Networks for Sequence Learning. Zachary C. Lipton, John Berkowitz, Charles Elkan. June 5th, 2015

CS231n Convolutional Neural Networks for Visual Recognition

rate research

Exploiting Image--Text Synergy for Contextual Image Captioning

1043 - Association for Computation Linguistics 2021 مقالة

Modern web content - news articles, blog posts, educational resources, marketing brochures - is predominantly multimodal. A notable trait is the inclusion of media such as images placed at meaningful locations within a textual narrative. Most often, such images are accompanied by captions - either factual or stylistic (humorous, metaphorical, etc.) - making the narrative more engaging to the reader. While standalone image captioning has been extensively studied, captioning an image based on external knowledge such as its surrounding text remains under-explored. In this paper, we study this new task: given an image and an associated unstructured knowledge snippet, the goal is to generate a contextual caption for the image.

text synergy synergy for contextual نص التآزر التآزر إلى السياق صناعة حمض الفوسفور

Journalistic Guidelines Aware News Image Captioning

686 - Association for Computation Linguistics 2021 مقالة

The task of news article image captioning aims to generate descriptive and informative captions for news article images. Unlike conventional image captions that simply describe the content of the image in general terms, news image captions follow jou rnalistic guidelines and rely heavily on named entities to describe the image content, often drawing context from the whole article they are associated with. In this work, we propose a new approach to this task, motivated by caption guidelines that journalists follow. Our approach, Journalistic Guidelines Aware News Image Captioning (JoGANIC), leverages the structure of captions to improve the generation quality and guide our representation design. Experimental results, including detailed ablation studies, on two large-scale publicly available datasets show that JoGANIC substantially outperforms state-of-the-art methods both on caption generation and named entity related metrics.

journalistic guidelines aware article image captioning المبادئ التوجيهية الصحفية تدرك صورة تقسيم الصورة صناعة حمض الفوسفور

Sturdy Hiding of Text File in Image

2519 - Aِl-Baath University 2015 ورقة بحثية

This research will show a sturdy method to hide a text file into an image using least significant bit algorithm and encrypting this text, which allows to store English and Arabic texts with various sizes and ensure that the text file is delivered correctly and secretly.

Cryptography Data Hiding Steganography Sturdy Encryption Robust إخفاء البيانات تشفير تعمية هاش علم الإخفاء المتقن المتين المزيد..

Visual News: Benchmark and Challenges in News Image Captioning

581 - Association for Computation Linguistics 2021 مقالة

We propose Visual News Captioner, an entity-aware model for the task of news image captioning. We also introduce Visual News, a large-scale benchmark consisting of more than one million news images along with associated news articles, image captions, author information, and other metadata. Unlike the standard image captioning task, news images depict situations where people, locations, and events are of paramount importance. Our proposed method can effectively combine visual and textual features to generate captions with richer information such as events and entities. More specifically, built upon the Transformer architecture, our model is further equipped with novel multi-modal feature fusion techniques and attention mechanisms, which are designed to generate named entities more accurately. Our method utilizes much fewer parameters while achieving slightly better prediction results than competing methods. Our larger and more diverse Visual News dataset further highlights the remaining challenges in captioning news images.

آلة تفاعلية image captioning task تقسيم الصور المهمة صناعة حمض الفوسفور

CLIPScore: A Reference-free Evaluation Metric for Image Captioning

636 - Association for Computation Linguistics 2021 مقالة

Image captioning has conventionally relied on reference-based automatic evaluations, where machine captions are compared against captions written by humans. This is in contrast to the reference-free manner in which humans assess caption quality. In t his paper, we report the surprising empirical finding that CLIP (Radford et al., 2021), a cross-modal model pretrained on 400M image+caption pairs from the web, can be used for robust automatic evaluation of image captioning without the need for references. Experiments spanning several corpora demonstrate that our new reference-free metric, CLIPScore, achieves the highest correlation with human judgements, outperforming existing reference-based metrics like CIDEr and SPICE. Information gain experiments demonstrate that CLIPScore, with its tight focus on image-text compatibility, is complementary to existing reference-based metrics that emphasize text-text similarities. Thus, we also present a reference-augmented version, RefCLIPScore, which achieves even higher correlation. Beyond literal description tasks, several case studies reveal domains where CLIPScore performs well (clip-art images, alt-text rating), but also where it is relatively weaker in comparison to reference-based metrics, e.g., news captions that require richer contextual knowledge.

بناء اللغة التصويرية captioning التوضيحية صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Image Captioning

توليد توصيف نصي للصور

Ask ChatGPT about the research

No English abstract

Read More

suggested questions