Research papers, master and doctoral theses about الرؤية

Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning

146 - Association for Computation Linguistics 2021 مقالة

Commonsense is defined as the knowledge on which everyone agrees. However, certain types of commonsense knowledge are correlated with culture and geographic locations and they are only shared locally. For example, the scenes of wedding ceremonies var y across regions due to different customs influenced by historical and religious factors. Such regional characteristics, however, are generally omitted in prior work. In this paper, we construct a Geo-Diverse Visual Commonsense Reasoning dataset (GD-VCR) to test vision-and-language models' ability to understand cultural and geo-location-specific commonsense. In particular, we study two state-of-the-art Vision-and-Language models, VisualBERT and ViLBERT trained on VCR, a standard benchmark with images primarily from Western regions. We then evaluate how well the trained models can generalize to answering the questions in GD-VCR. We find that the performance of both models for non-Western regions including East Asia, South Asia, and Africa is significantly lower than that for Western region. We analyze the reasons behind the performance disparity and find that the performance gap is larger on QA pairs that: 1) are concerned with culture-related scenarios, e.g., weddings, religious activities, and festivals; 2) require high-level geo-diverse commonsense reasoning rather than low-order perception and recognition. Dataset and code are released at https://github.com/WadeYin9712/GD-VCR.

broaden the vision visual commonsense reasoning geo-diverse visual commonsense توسيع الرؤية منطق العمولة البصرية العمولة البصرية المتنوعة صناعة حمض الفوسفور المزيد..

Neural Metaphor Detection with Visibility Embeddings

173 - Association for Computation Linguistics 2021 مقالة

We present new results for the problem of sequence metaphor labeling, using the recently developed Visibility Embeddings. We show that concatenating such embeddings to the input of a BiLSTM obtains consistent and significant improvements at almost no cost, and we present further improved results when visibility embeddings are combined with BERT.

neural metaphor detection metaphor detection visibility embeddings اكتشاف الاستعارة العصبي اكتشاف الاستعارة وظائف الرؤية صناعة حمض الفوسفور المزيد..

Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models

210 - Association for Computation Linguistics 2021 مقالة

This paper studies zero-shot cross-lingual transfer of vision-language models. Specifically, we focus on multilingual text-to-video search and propose a Transformer-based model that learns contextual multilingual multimodal embeddings. Under a zero-s hot setting, we empirically demonstrate that performance degrades significantly when we query the multilingual text-video model with non-English sentences. To address this problem, we introduce a multilingual multimodal pre-training strategy, and collect a new multilingual instructional video dataset (Multi-HowTo100M) for pre-training. Experiments on VTT show that our method significantly improves video search in non-English languages without additional annotations. Furthermore, when multilingual annotations are available, our method outperforms recent baselines by a large margin in multilingual text-to-video search on VTT and VATEX; as well as in multilingual text-to-image search on Multi30K. Our model and Multi-HowTo100M is available at http://github.com/berniebear/Multi-HT100M.

zero-shot cross-lingual transfer cross-lingual transfer transfer of vision-language صفر النار نقل عبر اللغات نقل عبر اللغات نقل لغة الرؤية صناعة حمض الفوسفور المزيد..

Framing Unpacked: A Semi-Supervised Interpretable Multi-View Model of Media Frames

221 - Association for Computation Linguistics 2021 مقالة

Understanding how news media frame political issues is important due to its impact on public attitudes, yet hard to automate. Computational approaches have largely focused on classifying the frame of a full news article while framing signals are ofte n subtle and local. Furthermore, automatic news analysis is a sensitive domain, and existing classifiers lack transparency in their predictions. This paper addresses both issues with a novel semi-supervised model, which jointly learns to embed local information about the events and related actors in a news article through an auto-encoding framework, and to leverage this signal for document-level frame classification. Our experiments show that: our model outperforms previous models of frame prediction; we can further improve performance with unlabeled training data leveraging the semi-supervised nature of our model; and the learnt event and actor embeddings intuitively corroborate the document-level predictions, providing a nuanced and interpretable article frame representation.

framing unpacked interpretable multi-view model media frame political تأطير تفريغ نموذج متعدد الرؤية القابلة للتفسير إطار وسائل الإعلام السياسية صناعة حمض الفوسفور المزيد..

Improving Cross-Modal Alignment in Vision Language Navigation via Syntactic Information

230 - Association for Computation Linguistics 2021 مقالة

Vision language navigation is the task that requires an agent to navigate through a 3D environment based on natural language instructions. One key challenge in this task is to ground instructions with the current visual information that the agent per ceives. Most of the existing work employs soft attention over individual words to locate the instruction required for the next action. However, different words have different functions in a sentence (e.g., modifiers convey attributes, verbs convey actions). Syntax information like dependencies and phrase structures can aid the agent to locate important parts of the instruction. Hence, in this paper, we propose a navigation agent that utilizes syntax information derived from a dependency tree to enhance alignment between the instruction and the current visual scenes. Empirically, our agent outperforms the baseline model that does not use syntax information on the Room-to-Room dataset, especially in the unseen environment. Besides, our agent achieves the new state-of-the-art on Room-Across-Room dataset, which contains instructions in 3 languages (English, Hindi, and Telugu). We also show that our agent is better at aligning instructions with the current visual information via qualitative visualizations.

vision language navigation improving cross-modal alignment vision language الرؤية لغة الملاحة تحسين المحاذاة عبر الوسائط لغة الرؤية صناعة حمض الفوسفور المزيد..

How Vision Affects Language: Comparing Masked Self-Attention in Uni-Modal and Multi-Modal Transformer

256 - Association for Computation Linguistics 2021 مقالة

The problem of interpretation of knowledge learned by multi-head self-attention in transformers has been one of the central questions in NLP. However, a lot of work mainly focused on models trained for uni-modal tasks, e.g. machine translation. In th is paper, we examine masked self-attention in a multi-modal transformer trained for the task of image captioning. In particular, we test whether the multi-modality of the task objective affects the learned attention patterns. Our visualisations of masked self-attention demonstrate that (i) it can learn general linguistic knowledge of the textual input, and (ii) its attention patterns incorporate artefacts from visual modality even though it has never accessed it directly. We compare our transformer's attention patterns with masked attention in distilgpt-2 tested for uni-modal text generation of image captions. Based on the maps of extracted attention weights, we argue that masked self-attention in image captioning transformer seems to be enhanced with semantic knowledge from images, exemplifying joint language-and-vision information in its attention patterns.

vision affects language comparing masked self-attention affects language الرؤية تؤثر على اللغة مقارنة اهتمامي عن النفس يؤثر اللغة صناعة حمض الفوسفور المزيد..

Reasoning over Vision and Language: Exploring the Benefits of Supplemental Knowledge

166 - Association for Computation Linguistics 2021 مقالة

The limits of applicability of vision-and language models are defined by the coverage of their training data. Tasks like vision question answering (VQA) often require commonsense and factual information beyond what can be learned from task-specific d atasets. This paper investigates the injection of knowledge from general-purpose knowledge bases (KBs) into vision-and-language transformers. We use an auxiliary training objective that encourages the learned representations to align with graph embeddings of matching entities in a KB. We empirically study the relevance of various KBs to multiple tasks and benchmarks. The technique brings clear benefits to knowledge-demanding question answering tasks (OK-VQA, FVQA) by capturing semantic and relational knowledge absent from existing models. More surprisingly, the technique also benefits visual reasoning tasks (NLVR2, SNLI-VE). We perform probing experiments and show that the injection of additional knowledge regularizes the space of embeddings, which improves the representation of lexical and semantic similarities. The technique is model-agnostic and can expand the applicability of any vision-and-language transformer with minimal computational overhead.

supplemental knowledge vision-and language models exploring المعرفة الإضافية نماذج الرؤية واللغة استكشاف صناعة حمض الفوسفور المزيد..

التصنيف التلقائي للهياكل الجيولوجية باستخدام الرؤية الحاسوبية ونموذج التعلم العميق

1145 - Tianjin 2018 ورقة بحثية

دراسة الهياكل الجيولوجية المكشوفة على سطح الأرض ذات أهمية كبيرة بشكل عام وخصوصا في التصميم الهندسي والبناء. في هذا البحث ، استخدمنا 2206 صورة مع 12 ملصق للتعرف على الهياكل الجيولوجية بناءً على نموذج Inception-v3. تم اعتماد الصور ذات التدرج الرمادي و اللون في النموذج. كما تم بناء نموذج الشبكة العصبية التلافيفية (CNN) وتم تطبيق خوارزمية أقرب جار (KNN) والشبكة العصبية الاصطناعية (ANN) وتعزيز التدرج الشديد (XGBoost) في تصنيف الهياكل الجيولوجية بناءً على الميزات المستخرجة من مكتبة رؤية الكمبيوتر مفتوحة المصدر (OpenCV). أخيرًا ، تمت مقارنة أداء الطرق الخمس وأظهرت النتائج أن أداء KNN و ANN و XGBoost كان ضعيفًا وبدقة أقل من 40.0٪. أما CNN فعد عانت من فرط التدريب Overfitting. كان للنموذج الذي تم تدريبه باستخدام التعلم بالنقل تأثير كبير على مجموعة بيانات صغيرة من صور التركيب الجيولوجي. وأفضل نموذجين وصلوا إلى دقة 83.3٪ و 90.0٪ على التوالي. هذا يدل على أن النسيج هو السمة الرئيسية في هذا البحث. يمكن أن يستخرج التعلم القائم على نموذج التعلم العميق ميزات بيانات البنية الجيولوجية الصغيرة بشكل فعال ، وهو قوي في تصنيف صور الهيكل الجيولوجي.

الشبكات العصبونية الرؤية الحاسوبية التعلم الآلي التعلم العميق الشبكات العصبونية الالتفافية OpenCV الرؤية الحاسوببة والجيولوجيا المزيد..

Requirements For The Implementation Of Strategic Planning In The Port Governmental Company Of Tartous - Field Study

1437 - Aِl-Baath University 2017 ورقة بحثية

This Research Aims To Study The Requirements For The Implementation Of Strategic Planning In The Port Governmental Company Of Tartous, And The Existence Of Some Of These Requirements In The Port Governmental Company Of Tartous.

Vision Strategic Planning التخطيط الاستراتيجي الرسالة Mission التفكير الاستراتيجي الرؤية الاستراتيجية Strategic Thinking المزيد..

Organizational Culture as one of the Key Requirements for the Success of Knowledge Management

1818 - Aِl-Baath University 2017 ورقة بحثية

The research aims to examine the relationship between organizational culture on the one hand, and between knowledge management practices in Tishreen University, on the other hand, where the researcher distributed a questionnaire to a sample of the colleges in the University of Tishreen reached (205) questionnaire was recovered (222) to identify them, it was good of them use (158) questionnaire. To test the relationship, the researcher used the Student T test for one sample, as well as the Pearson correlation coefficient.

الثقافة التنظيمية Organizational Culture knowledge management إدارة المعرفة تشارك المعرفة تخزين المعرفة الثقة المتبادلة الرؤية المشتركة knowledge sharing knowledge storage mutual trust shared vision المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد