New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning

توسيع الرؤية: منطق العمولة البصرية المتنوعة

225 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

broaden the vision visual commonsense reasoning geo-diverse visual commonsense توسيع الرؤية منطق العمولة البصرية العمولة البصرية المتنوعة صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

يتم تعريف المنزول على أنه المعرفة التي يوافق عليها الجميع. ومع ذلك، فإن أنواع معينة من المعرفة المنطقية مرتبطة بالثقافة والمواقع الجغرافية ويتم تقاسمها فقط محليا. على سبيل المثال، تختلف مشاهد مراسم الزفاف عبر المناطق الناجمة عن الجمارك المختلفة التي تتأثر بالعوامل التاريخية والدينية. ومع ذلك، حذفت هذه الخصائص الإقليمية عموما في العمل السابق. في هذه الورقة، نقوم بإنشاء مجموعة بيانات منطق مرئية للبصرية (GD-VCR) لاختبار قدرة النماذج في الرؤية واللغة على فهم المنطقية الثقافية والرائعة. على وجه الخصوص، نقوم بدراسة نماذج للرؤية واللغدية التي من بين الفنون، و Visualbert و Vilbert تدربت على VCR، وهو معيار قياسي مع الصور في المقام الأول من المناطق الغربية. بعد ذلك تقييم مدى جودة أن تعميم النماذج المدربة للإجابة على الأسئلة في GD-VCR. نجد أن أداء كلا النماذج للمناطق غير الغربية بما في ذلك شرق آسيا وجنوب آسيا وأفريقيا أقل بكثير من تلك الخاصة بالمنطقة الغربية. نقوم بتحليل الأسباب الكامنة وراء تباين الأداء وتجد أن فجوة الأداء أكبر على أزواج ضمنيا: 1) تشعر بالقلق من السيناريوهات المتعلقة بالثقافة، على سبيل المثال، حفلات الزفاف، الأنشطة الدينية، والمهرجانات؛ 2) تتطلب منطق المنطقي الجغرافي الرفيع المستوى بدلا من التصور والاعتراف بالترتيب المنخفض. يتم إصدار DataSet و Code في https://github.com/wadeyin9712/gd-vcr.

Commonsense is defined as the knowledge on which everyone agrees. However, certain types of commonsense knowledge are correlated with culture and geographic locations and they are only shared locally. For example, the scenes of wedding ceremonies vary across regions due to different customs influenced by historical and religious factors. Such regional characteristics, however, are generally omitted in prior work. In this paper, we construct a Geo-Diverse Visual Commonsense Reasoning dataset (GD-VCR) to test vision-and-language models' ability to understand cultural and geo-location-specific commonsense. In particular, we study two state-of-the-art Vision-and-Language models, VisualBERT and ViLBERT trained on VCR, a standard benchmark with images primarily from Western regions. We then evaluate how well the trained models can generalize to answering the questions in GD-VCR. We find that the performance of both models for non-Western regions including East Asia, South Asia, and Africa is significantly lower than that for Western region. We analyze the reasons behind the performance disparity and find that the performance gap is larger on QA pairs that: 1) are concerned with culture-related scenarios, e.g., weddings, religious activities, and festivals; 2) require high-level geo-diverse commonsense reasoning rather than low-order perception and recognition. Dataset and code are released at https://github.com/WadeYin9712/GD-VCR.

References used

https://aclanthology.org/

rate research

Reasoning over Vision and Language: Exploring the Benefits of Supplemental Knowledge

270 - Association for Computation Linguistics 2021 مقالة

The limits of applicability of vision-and language models are defined by the coverage of their training data. Tasks like vision question answering (VQA) often require commonsense and factual information beyond what can be learned from task-specific d atasets. This paper investigates the injection of knowledge from general-purpose knowledge bases (KBs) into vision-and-language transformers. We use an auxiliary training objective that encourages the learned representations to align with graph embeddings of matching entities in a KB. We empirically study the relevance of various KBs to multiple tasks and benchmarks. The technique brings clear benefits to knowledge-demanding question answering tasks (OK-VQA, FVQA) by capturing semantic and relational knowledge absent from existing models. More surprisingly, the technique also benefits visual reasoning tasks (NLVR2, SNLI-VE). We perform probing experiments and show that the injection of additional knowledge regularizes the space of embeddings, which improves the representation of lexical and semantic similarities. The technique is model-agnostic and can expand the applicability of any vision-and-language transformer with minimal computational overhead.

supplemental knowledge vision-and language models exploring المعرفة الإضافية نماذج الرؤية واللغة استكشاف صناعة حمض الفوسفور المزيد..

Improving Unsupervised Commonsense Reasoning Using Knowledge-Enabled Natural Language Inference

385 - Association for Computation Linguistics 2021 مقالة

Recent methods based on pre-trained language models have shown strong supervised performance on commonsense reasoning. However, they rely on expensive data annotation and time-consuming training. Thus, we focus on unsupervised commonsense reasoning. We show the effectiveness of using a common framework, Natural Language Inference (NLI), to solve diverse commonsense reasoning tasks. By leveraging transfer learning from large NLI datasets, and injecting crucial knowledge from commonsense sources such as ATOMIC 2020 and ConceptNet, our method achieved state-of-the-art unsupervised performance on two commonsense reasoning tasks: WinoWhy and CommonsenseQA. Further analysis demonstrated the benefits of multiple categories of knowledge, but problems about quantities and antonyms are still challenging.

خطأ في مجال كثافة الخطأ knowledge-enabled natural language language inference اللغة الطبيعية الممكن المعرفة استنتاج اللغة صناعة حمض الفوسفور

Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer

405 - Association for Computation Linguistics 2021 مقالة

Visual dialog is a task of answering a sequence of questions grounded in an image using the previous dialog history as context. In this paper, we study how to address two fundamental challenges for this task: (1) reasoning over underlying semantic st ructures among dialog rounds and (2) identifying several appropriate answers to the given question. To address these challenges, we propose a Sparse Graph Learning (SGL) method to formulate visual dialog as a graph structure learning task. SGL infers inherently sparse dialog structures by incorporating binary and score edges and leveraging a new structural loss function. Next, we introduce a Knowledge Transfer (KT) method that extracts the answer predictions from the teacher model and uses them as pseudo labels. We propose KT to remedy the shortcomings of single ground-truth labels, which severely limit the ability of a model to obtain multiple reasonable answers. As a result, our proposed model significantly improves reasoning capability compared to baseline methods and outperforms the state-of-the-art approaches on the VisDial v1.0 dataset. The source code is available at https://github.com/gicheonkang/SGLKT-VisDial.

sparse graph learning graph learning الرسم البياني المتفرق يتعلم الرسم البياني تعلم صناعة حمض الفوسفور

Joint Passage Ranking for Diverse Multi-Answer Retrieval

483 - Association for Computation Linguistics 2021 مقالة

We study multi-answer retrieval, an under-explored problem that requires retrieving passages to cover multiple distinct answers for a given question. This task requires joint modeling of retrieved passages, as models should not repeatedly retrieve pa ssages containing the same answer at the cost of missing a different valid answer. Prior work focusing on single-answer retrieval is limited as it cannot reason about the set of passages jointly. In this paper, we introduce JPR, a joint passage retrieval model focusing on reranking. To model the joint probability of the retrieved passages, JPR makes use of an autoregressive reranker that selects a sequence of passages, equipped with novel training and decoding algorithms. Compared to prior approaches, JPR achieves significantly better answer coverage on three multi-answer datasets. When combined with downstream question answering, the improved retrieval enables larger answer generation models since they need to consider fewer passages, establishing a new state-of-the-art.

ranking for diverse diverse multi-answer retrieval joint passage ranking ترتيب متنوع تنوع متعدد الإجابة المتنوعة تصنيف المقطع المشترك صناعة حمض الفوسفور المزيد..

Vision cash for art when Herbert Marcuse

2785 - Tishreen University 2014 ورقة بحثية

Aims find dismantling infrastructure formative to see Marcuse cash – analytical, of how the like can be art: imagination- or what he called the new sensitivity – to play in a matter of revolutionizing awareness and the formation of perception. Workin g tools new knowledge motivation main breeding aesthetic actor, and a new language- to create a new world on the level of thought and reality. In a world possible for a rational civilization technologically advanced, and required by the overall process of the process of production of the necessities, and the policies of capital, and market volatility, and means of Mass communication, and methods of advertising…..etc. that reinforce the foundations of the entire system of control and coordination and domination strips in advance of cash protest, opposition and all of its weapons, and fake awareness, and reduce the internal dimension of culture and thought, and creates countless needs of Pseudomonas. However converts individual selves as a whole, as things to tools running in a huge total productive, derives its raison derter, and the continuation of his, and his strength, and the inclusion of dominance, the productivity of a huge, and productivity than do those of the achievements in the various level of life.

Art New Sensitivity Objectification Revolutionizing awareness Aesthetic Education alienation الفن الحساسية الجديدة التربية الجمالية لعب المخيلة عالم الإنشاء المغلق المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning

توسيع الرؤية: منطق العمولة البصرية المتنوعة

Ask ChatGPT about the research

Read More

suggested questions