Research papers, master and doctoral theses about Vision

Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning

146 - Association for Computation Linguistics 2021 مقالة

Commonsense is defined as the knowledge on which everyone agrees. However, certain types of commonsense knowledge are correlated with culture and geographic locations and they are only shared locally. For example, the scenes of wedding ceremonies var y across regions due to different customs influenced by historical and religious factors. Such regional characteristics, however, are generally omitted in prior work. In this paper, we construct a Geo-Diverse Visual Commonsense Reasoning dataset (GD-VCR) to test vision-and-language models' ability to understand cultural and geo-location-specific commonsense. In particular, we study two state-of-the-art Vision-and-Language models, VisualBERT and ViLBERT trained on VCR, a standard benchmark with images primarily from Western regions. We then evaluate how well the trained models can generalize to answering the questions in GD-VCR. We find that the performance of both models for non-Western regions including East Asia, South Asia, and Africa is significantly lower than that for Western region. We analyze the reasons behind the performance disparity and find that the performance gap is larger on QA pairs that: 1) are concerned with culture-related scenarios, e.g., weddings, religious activities, and festivals; 2) require high-level geo-diverse commonsense reasoning rather than low-order perception and recognition. Dataset and code are released at https://github.com/WadeYin9712/GD-VCR.

broaden the vision visual commonsense reasoning geo-diverse visual commonsense توسيع الرؤية منطق العمولة البصرية العمولة البصرية المتنوعة صناعة حمض الفوسفور المزيد..

Using Computer Vision to Analyze Non-manual Marking of Questions in KRSL

101 - Association for Computation Linguistics 2021 مقالة

This paper presents a study that compares non-manual markers of polar and wh-questions to statements in Kazakh-Russian Sign Language (KRSL) in a dataset collected for NLP tasks. The primary focus of the study is to demonstrate the utility of computer vision solutions for the linguistic analysis of non-manuals in sign languages, although additional corrections are required to account for biases in the output. To this end, we analyzed recordings of 10 triplets of sentences produced by 9 native signers using both manual annotation and computer vision solutions (such as OpenFace). We utilize and improve the computer vision solution, and briefly describe the results of the linguistic analysis.

analyze non-manual marking marking of questions computer vision solutions تحليل العلامات غير اليدوية بمناسبة الأسئلة حلول رؤية الكمبيوتر صناعة حمض الفوسفور المزيد..

Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models

211 - Association for Computation Linguistics 2021 مقالة

This paper studies zero-shot cross-lingual transfer of vision-language models. Specifically, we focus on multilingual text-to-video search and propose a Transformer-based model that learns contextual multilingual multimodal embeddings. Under a zero-s hot setting, we empirically demonstrate that performance degrades significantly when we query the multilingual text-video model with non-English sentences. To address this problem, we introduce a multilingual multimodal pre-training strategy, and collect a new multilingual instructional video dataset (Multi-HowTo100M) for pre-training. Experiments on VTT show that our method significantly improves video search in non-English languages without additional annotations. Furthermore, when multilingual annotations are available, our method outperforms recent baselines by a large margin in multilingual text-to-video search on VTT and VATEX; as well as in multilingual text-to-image search on Multi30K. Our model and Multi-HowTo100M is available at http://github.com/berniebear/Multi-HT100M.

zero-shot cross-lingual transfer cross-lingual transfer transfer of vision-language صفر النار نقل عبر اللغات نقل عبر اللغات نقل لغة الرؤية صناعة حمض الفوسفور المزيد..

Improving Cross-Modal Alignment in Vision Language Navigation via Syntactic Information

231 - Association for Computation Linguistics 2021 مقالة

Vision language navigation is the task that requires an agent to navigate through a 3D environment based on natural language instructions. One key challenge in this task is to ground instructions with the current visual information that the agent per ceives. Most of the existing work employs soft attention over individual words to locate the instruction required for the next action. However, different words have different functions in a sentence (e.g., modifiers convey attributes, verbs convey actions). Syntax information like dependencies and phrase structures can aid the agent to locate important parts of the instruction. Hence, in this paper, we propose a navigation agent that utilizes syntax information derived from a dependency tree to enhance alignment between the instruction and the current visual scenes. Empirically, our agent outperforms the baseline model that does not use syntax information on the Room-to-Room dataset, especially in the unseen environment. Besides, our agent achieves the new state-of-the-art on Room-Across-Room dataset, which contains instructions in 3 languages (English, Hindi, and Telugu). We also show that our agent is better at aligning instructions with the current visual information via qualitative visualizations.

vision language navigation improving cross-modal alignment vision language الرؤية لغة الملاحة تحسين المحاذاة عبر الوسائط لغة الرؤية صناعة حمض الفوسفور المزيد..

How Vision Affects Language: Comparing Masked Self-Attention in Uni-Modal and Multi-Modal Transformer

257 - Association for Computation Linguistics 2021 مقالة

The problem of interpretation of knowledge learned by multi-head self-attention in transformers has been one of the central questions in NLP. However, a lot of work mainly focused on models trained for uni-modal tasks, e.g. machine translation. In th is paper, we examine masked self-attention in a multi-modal transformer trained for the task of image captioning. In particular, we test whether the multi-modality of the task objective affects the learned attention patterns. Our visualisations of masked self-attention demonstrate that (i) it can learn general linguistic knowledge of the textual input, and (ii) its attention patterns incorporate artefacts from visual modality even though it has never accessed it directly. We compare our transformer's attention patterns with masked attention in distilgpt-2 tested for uni-modal text generation of image captions. Based on the maps of extracted attention weights, we argue that masked self-attention in image captioning transformer seems to be enhanced with semantic knowledge from images, exemplifying joint language-and-vision information in its attention patterns.

vision affects language comparing masked self-attention affects language الرؤية تؤثر على اللغة مقارنة اهتمامي عن النفس يؤثر اللغة صناعة حمض الفوسفور المزيد..

Reasoning over Vision and Language: Exploring the Benefits of Supplemental Knowledge

166 - Association for Computation Linguistics 2021 مقالة

The limits of applicability of vision-and language models are defined by the coverage of their training data. Tasks like vision question answering (VQA) often require commonsense and factual information beyond what can be learned from task-specific d atasets. This paper investigates the injection of knowledge from general-purpose knowledge bases (KBs) into vision-and-language transformers. We use an auxiliary training objective that encourages the learned representations to align with graph embeddings of matching entities in a KB. We empirically study the relevance of various KBs to multiple tasks and benchmarks. The technique brings clear benefits to knowledge-demanding question answering tasks (OK-VQA, FVQA) by capturing semantic and relational knowledge absent from existing models. More surprisingly, the technique also benefits visual reasoning tasks (NLVR2, SNLI-VE). We perform probing experiments and show that the injection of additional knowledge regularizes the space of embeddings, which improves the representation of lexical and semantic similarities. The technique is model-agnostic and can expand the applicability of any vision-and-language transformer with minimal computational overhead.

supplemental knowledge vision-and language models exploring المعرفة الإضافية نماذج الرؤية واللغة استكشاف صناعة حمض الفوسفور المزيد..

The Role of Organizational Intelligence Dimensions in Developing the Performance of Insurance Companies (a Field Study on Private Insurance Companies in The Syrian Coast)

446 - Tishreen University 2019 مقالة

The study aims to study the availability of the dimensions of organizational intelligence in the private insurance companies in the Syrian coast, and study the nature and strength of the relationship between the dimensions of organizational intellige nce and performance. To achieve this, three hypotheses were formulated. The researcher used the questionnaire technique to collect the data analyzed using statistical tests, the most important of which were: the one-sample T. test, the Pearson Correlation test, and the simple regression test. The researcher found several results, the most important of which is: Organizational intelligence in the studied companies is well evaluated, there is a positive relationship between the dimensions of organizational intelligence and performance, and there is a statistically significant effect of organizational intelligence on performance in the companies under study.

Performance Organizational Intelligence Strategic Vision Desire For Change Jointly Raising Employees for The Sake of Further Efforts Harmonization And Application Performance Pressure المزيد..

Automatic Estimation of 3D Human Pose and Shape from a Single Image

1456 - Damascus University 2018 حلقة بحث

إعادة تشكيل وضعيات الإنسان ثلاثية الأبعاد من صورة واحدة ثنائية الأبعاد هي مشكلة تمثل تحديا للعديد من الباحثين. وفي السنوات الأخيرة، كان هناك اتجاه صاعد نحو تحليل الهندسة ثلاثية الأبعاد للكائنات بما في ذلك الأشكال والوضع بدلاً من مجرد تقديم مربعات مر بوطة. حيث أن التفكير الهندسي ثلاثي الأبعاد يؤدي إلى توفير معلومات أكثر ثراءً عن المشهد لمهام لاحقة عالية المستوى مثل فهم المشهد والواقع المعزز والتفاعل مع الكمبيوتر البشري، بالإضافة أيضًا تحسين اكتشاف الكائنات [3]، [4]. ولذلك كانت إعادة التشكيل ثلاثية الأبعاد مشكلة مدروسة جيداً، وكانت هناك العديد من التقنيات القابلة للتطبيق عمليًا مثل البنية من الحركة، والأنظمة الصوتية متعددة المقاييس ومستشعرات العمق، ولكن هذه التقنيات محدودة في بعض السيناريوهات. هنا في هذه الورقة، نعرض كيف تم التعامل مع المشكلة في العقود القليلة الماضية، وتحليل التطورات الأخيرة في هذا المجال، والاتجاهات المحتملة للبحث في المستقبل.

Computer Vision CNN 3D

Requirements For The Implementation Of Strategic Planning In The Port Governmental Company Of Tartous - Field Study

1437 - Aِl-Baath University 2017 ورقة بحثية

This Research Aims To Study The Requirements For The Implementation Of Strategic Planning In The Port Governmental Company Of Tartous, And The Existence Of Some Of These Requirements In The Port Governmental Company Of Tartous.

Vision Strategic Planning التخطيط الاستراتيجي الرسالة Mission التفكير الاستراتيجي الرؤية الاستراتيجية Strategic Thinking المزيد..

Organizational Culture as one of the Key Requirements for the Success of Knowledge Management

1818 - Aِl-Baath University 2017 ورقة بحثية

The research aims to examine the relationship between organizational culture on the one hand, and between knowledge management practices in Tishreen University, on the other hand, where the researcher distributed a questionnaire to a sample of the colleges in the University of Tishreen reached (205) questionnaire was recovered (222) to identify them, it was good of them use (158) questionnaire. To test the relationship, the researcher used the Student T test for one sample, as well as the Pearson correlation coefficient.

الثقافة التنظيمية Organizational Culture knowledge management إدارة المعرفة تشارك المعرفة تخزين المعرفة الثقة المتبادلة الرؤية المشتركة knowledge sharing knowledge storage mutual trust shared vision المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد