Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Improving Generation and Evaluation of Visual Stories via Semantic Consistency

تحسين جيل وتقييم القصص البصرية عبر الاتساق الدلالي

614 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

improving generation visual stories semantic consistency تحسين الجيل القصص البصرية الاتساق الدلالي صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

تعتبر تصور القصة مهمة غير مسجلة تقع عند تقاطع العديد من الاتجاهات البحثية المهمة في كل من رؤية الكمبيوتر ومعالجة اللغات الطبيعية. في هذه المهمة، نظرا لسلسلة من التسميات التوضيحية باللغة الطبيعية التي تنشأ قصة، يجب أن يولد الوكيل سلسلة من الصور التي تتوافق مع التسميات التوضيحية. قدم العمل السابق نماذج تائحة تكرار تتفوق نماذج توليف النص إلى الصورة في هذه المهمة. ومع ذلك، هناك مجال لتحسين الصور التي تم إنشاؤها من حيث الجودة البصرية والتماسك والأهمية. نقدم عددا من التحسينات إلى نهج النمذجة السابقة، بما في ذلك (1) إضافة إطار تعليمي مزدوج يستخدم تقسيم الفيديو لتعزيز المحاذاة الدلالية بين القصة والصور التي تم إنشاؤها، (2) آلية تحويل النسخ المتوسطة تصور القصة، و (3) من المحولات المستندة إلى مارت إلى التفاعلات المعقدة بين الإطارات. نقدم دراسات الاجتثاث لإظهار تأثير كل تقنيات من هذه التقنيات على القوة المنتجة للنموذج لكل من الصور الفردية وكذلك السرد بأكمله. علاوة على ذلك، بسبب تعقيد الطبيعة والطبيعة الإندنية للمهمة، لا تعكس مقاييس التقييم القياسية الأداء بدقة. لذلك، فإننا نقدم أيضا استكشاف مقاييس التقييم للنموذج، ركز على جوانب الإطارات التي تم إنشاؤها مثل وجود / جودة الشخصيات الناتجة، والأهمية التعيينات، وتنوع الصور التي تم إنشاؤها. نقدم أيضا تجارب الارتباط لمقاييسنا الآلية المقترحة مع التقييمات البشرية.

Story visualization is an underexplored task that falls at the intersection of many important research directions in both computer vision and natural language processing. In this task, given a series of natural language captions which compose a story, an agent must generate a sequence of images that correspond to the captions. Prior work has introduced recurrent generative models which outperform text-to-image synthesis models on this task. However, there is room for improvement of generated images in terms of visual quality, coherence and relevance. We present a number of improvements to prior modeling approaches, including (1) the addition of a dual learning framework that utilizes video captioning to reinforce the semantic alignment between the story and generated images, (2) a copy-transform mechanism for sequentially-consistent story visualization, and (3) MART-based transformers to model complex interactions between frames. We present ablation studies to demonstrate the effect of each of these techniques on the generative power of the model for both individual images as well as the entire narrative. Furthermore, due to the complexity and generative nature of the task, standard evaluation metrics do not accurately reflect performance. Therefore, we also provide an exploration of evaluation metrics for the model, focused on aspects of the generated frames such as the presence/quality of generated characters, the relevance to captions, and the diversity of the generated images. We also present correlation experiments of our proposed automated metrics with human evaluations.

References used

https://aclanthology.org/

rate research

Diversity and Consistency: Exploring Visual Question-Answer Pair Generation

530 - Association for Computation Linguistics 2021 مقالة

Although showing promising values to downstream applications, generating question and answer together is under-explored. In this paper, we introduce a novel task that targets question-answer pair generation from visual images. It requires not only ge nerating diverse question-answer pairs but also keeping the consistency of them. We study different generation paradigms for this task and propose three models: the pipeline model, the joint model, and the sequential model. We integrate variational inference into these models to achieve diversity and consistency. We also propose region representation scaling and attention alignment to improve the consistency further. We finally devise an evaluator as a quantitative metric for consistency. We validate our approach on two benchmarks, VQA2.0 and Visual-7w, by automatically and manually evaluating diversity and consistency. Experimental results show the effectiveness of our models: they can generate diverse or consistent pairs. Moreover, this task can be used to improve visual question generation and visual question answering.

exploring visual question-answer exploring visual question-answer pair generation استكشاف إجابة السؤال المرئي استكشاف البصرية جيل زوج الإجابة الإجابة صناعة حمض الفوسفور المزيد..

Controlling Dialogue Generation with Semantic Exemplars

646 - Association for Computation Linguistics 2021 مقالة

Dialogue systems pretrained with large language models generate locally coherent responses, but lack fine-grained control over responses necessary to achieve specific goals. A promising method to control response generation is exemplar-based generati on, in which models edit exemplar responses that are retrieved from training data, or hand-written to strategically address discourse-level goals, to fit new dialogue contexts. We present an Exemplar-based Dialogue Generation model, EDGE, that uses the semantic frames present in exemplar responses to guide response generation. We show that controlling dialogue generation based on the semantic frames of exemplars improves the coherence of generated responses, while preserving semantic meaning and conversation goals present in exemplar responses.

controlling dialogue generation dialogue generation exemplar-based dialogue generation السيطرة على جيل الحوار جيل الحوار توليد الحوار المستندة إلى Exemplar صناعة حمض الفوسفور المزيد..

Controllable Semantic Parsing via Retrieval Augmentation

691 - Association for Computation Linguistics 2021 مقالة

In practical applications of semantic parsing, we often want to rapidly change the behavior of the parser, such as enabling it to handle queries in a new domain, or changing its predictions on certain targeted queries. While we can introduce new trai ning examples exhibiting the target behavior, a mechanism for enacting such behavior changes without expensive model re-training would be preferable. To this end, we propose ControllAble Semantic Parser via Exemplar Retrieval (CASPER). Given an input query, the parser retrieves related exemplars from a retrieval index, augments them to the query, and then applies a generative seq2seq model to produce an output parse. The exemplars act as a control mechanism over the generic generative model: by manipulating the retrieval index or how the augmented query is constructed, we can manipulate the behavior of the parser. On the MTOP dataset, in addition to achieving state-of-the-art on the standard setup, we show that CASPER can parse queries in a new domain, adapt the prediction toward the specified patterns, or adapt to new semantic schemas without having to further re-train the model.

controllable semantic parsing التحليل الدلالي يمكن السيطرة عليها صناعة حمض الفوسفور

Measuring and Improving Consistency in Pretrained Language Models

862 - Association for Computation Linguistics 2021 مقالة

Abstract Consistency of a model---that is, the invariance of its behavior under meaning-preserving alternations in its input---is a highly desirable property in natural language processing. In this paper we study the question: Are Pretrained Language Models (PLMs) consistent with respect to factual knowledge? To this end, we create ParaRel?, a high-quality resource of cloze-style query English paraphrases. It contains a total of 328 paraphrases for 38 relations. Using ParaRel?, we show that the consistency of all PLMs we experiment with is poor--- though with high variance between relations. Our analysis of the representational spaces of PLMs suggests that they have a poor structure and are currently not suitable for representing knowledge robustly. Finally, we propose a method for improving model consistency and experimentally demonstrate its effectiveness.1

pretrained language models pretrained language نماذج اللغة المحددة مسبقا اللغة المحددة صناعة حمض الفوسفور

Improving the extraction of audio features In audio-visual Arabic systems

1791 - Aِl-Baath University 2017 ورقة بحثية

The audio-visual speech recognition systems that rely on speech and movement of the lips of the speaker of the most important speech recognition systems. Many different techniques have developed in terms of the methods used in the feature extracti on and classification methods. Research proposes the establishment of a system to identify isolated words based audio features extracted from videos pronunciations of words in Arabic in an environment free of noise, and then add the energy and Temporal derivative components in extracting features of the method Mel Frequency Cepstral Coefficient (MFCC) stage.

Features extraction MFCC نماذج ماركوف المخفية التعرف على الكلام استخراج السمات خوارزمية معاملات تردد الميل المشتقات التفاضلية مكون الطاقة Speech recognition Markov Hidden models Temporal derivatives energy component المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Improving Generation and Evaluation of Visual Stories via Semantic Consistency

تحسين جيل وتقييم القصص البصرية عبر الاتساق الدلالي

Ask ChatGPT about the research

Read More

suggested questions