Do you want to publish a course? Click here

A Semantic Approach for Improving Scene Understanding

مقارنة دلالية لتحسين فهم المشهد

840   0   11   0.0 ( 0 )
 Publication date 2015
and research's language is العربية
 Created by Shamra Editor




Ask ChatGPT about the research

People live in various environments, although they can understand scenes around them with just a glance. To do this, they depend on their high ability to effectively process visual data and connect it to wide pre-knowledge about what they are expected to see. This is not the case for computers, which can’t reach high levels of scene understanding until now. Most researches treat scene understanding as a usual classification problem, where they have just to classify scenes in predefined limited categories (forest, city, garden). They normally used classification or machine learning algorithms, which limit their ability to understand scenes and reduces their chances to be used in a practical way because of a required training phase of these algorithms. Some researches try to make use of knowledge in Ontologies to reach a high level scene understanding, but these researches are still limited to specific domains only. In this thesis we are trying to understand scene images without any pre-knowledge about their domain. We will not treat this problem as a normal classification problem; however we will extract high level concepts from scene images. These concepts will not only represent objects in the scene, but they will also reflect the places and events in the scene. To do this, we develop a novel algorithm named SMHITS. It depends on a semantically rich common sense knowledge base to extract associated concepts with a primitive group of concepts. To use SMHITS in scene understanding, we also develop a system named ICES. Instead of using a classification or machine learning algorithm, ICES depends on a large dataset of images that is independent of any scene domain. Results show the superiority of SMHITS comparing to current ConceptNet associated concepts extraction algorithm, as it has higher precision and can take advantage of expansion of its knowledge base. Results also show that ICES output concepts are semantically rich.


Artificial intelligence review:
Research summary
تتناول هذه الأطروحة مشكلة فهم المشاهد من خلال تطوير نظام جديد يعتمد على استخراج المفاهيم الضمنية من الصور بدلاً من تصنيفها ضمن تصنيفات محددة مسبقاً. تعتمد الأطروحة على تطوير خوارزمية جديدة تسمى SMHITS التي تعتمد على شبكة معارف شائعة لاستخراج المفاهيم المرتبطة دلالياً بمجموعة من المفاهيم الأولية المستخرجة من الصور. يتكون النظام المقترح، المسمى ICES، من مرحلتين: الأولى تعتمد على قاعدة صور غير متخصصة لاستخراج المفاهيم الأولية، والثانية تعتمد على خوارزمية SMHITS لاستخراج المفاهيم الضمنية. أظهرت النتائج تفوق خوارزمية SMHITS على الخوارزميات الحالية من حيث الدقة والغنى الدلالي للمفاهيم المستخرجة.
Critical review
على الرغم من أن الأطروحة تقدم حلاً مبتكراً لمشكلة فهم المشاهد، إلا أن هناك بعض النقاط التي يمكن تحسينها. أولاً، تعتمد الأطروحة بشكل كبير على قاعدة الصور المستخدمة، مما قد يحد من تطبيق النظام في مجالات أخرى تحتاج إلى قواعد صور مختلفة. ثانياً، لا تزال الخوارزمية تعتمد على شبكة معارف شائعة قد تحتوي على بعض الأخطاء أو التناقضات في العلاقات الدلالية. ثالثاً، يمكن تحسين النظام من خلال دمج تقنيات تعلم الآلة الحديثة مثل التعلم العميق لتحسين دقة استخراج المفاهيم الضمنية.
Questions related to the research
  1. ما هي الخوارزمية الجديدة التي تم تطويرها في هذه الأطروحة؟

    الخوارزمية الجديدة التي تم تطويرها تسمى SMHITS، وهي تعتمد على شبكة معارف شائعة لاستخراج المفاهيم المرتبطة دلالياً بمجموعة من المفاهيم الأولية المستخرجة من الصور.

  2. ما هي المراحل التي يتكون منها نظام ICES؟

    يتكون نظام ICES من مرحلتين: الأولى تعتمد على قاعدة صور غير متخصصة لاستخراج المفاهيم الأولية، والثانية تعتمد على خوارزمية SMHITS لاستخراج المفاهيم الضمنية.

  3. ما هي النتائج التي أظهرتها خوارزمية SMHITS مقارنة بالخوارزميات الحالية؟

    أظهرت النتائج تفوق خوارزمية SMHITS على الخوارزميات الحالية من حيث الدقة والغنى الدلالي للمفاهيم المستخرجة.

  4. ما هي النقاط التي يمكن تحسينها في الأطروحة؟

    يمكن تحسين الأطروحة من خلال تقليل الاعتماد على قاعدة الصور المستخدمة، تحسين دقة شبكة المعارف الشائعة، ودمج تقنيات تعلم الآلة الحديثة مثل التعلم العميق.


References used
L. Shapiro and G. C. Stockman, Computer Vision: Prentice Hall, 2001
. R. Davies, Machine Vision: Theory, Algorithms, Practicalities: Morgan Kaufmann Publishers Inc., 2004
. Szeliski, Computer Vision: Algorithms and Applications: Springer-Verlag New York, Inc., 2010.
B. Jiihne and H. Hauflecker, Computer Vision and Applications: A Guide for Students and Practitioners: Academic Press, San Diego, California, 2000.
N. Pears, Y. Liu, and P. Bunting, 3D Imaging, Analysis and Applications :Springer, 2012
A. Oliva, "Scene Perception," in the New Visual Neurosciences, E. J. S. Werner and L. M. Chalupa, Eds., ed: MIT Press, 2012.
A. Oliva, "Visual Scene Perception," Massachusetts Institute of Technology 2009.
rate research

Read More

To effectively apply robots in working environments and assist humans, it is essential to develop and evaluate how visual grounding (VG) can affect machine performance on occluded objects. However, current VG works are limited in working environments , such as offices and warehouses, where objects are usually occluded due to space utilization issues. In our work, we propose a novel OCID-Ref dataset featuring a referring expression segmentation task with referring expressions of occluded objects. OCID-Ref consists of 305,694 referring expressions from 2,300 scenes with providing RGB image and point cloud inputs. To resolve challenging occlusion issues, we argue that it's crucial to take advantage of both 2D and 3D signals to resolve challenging occlusion issues. Our experimental results demonstrate the effectiveness of aggregating 2D and 3D signals but referring to occluded objects still remains challenging for the modern visual grounding systems. OCID-Ref is publicly available at https://github.com/lluma/OCID-Ref
Spoken language understanding (SLU) extracts the intended mean- ing from a user utterance and is a critical component of conversational virtual agents. In enterprise virtual agents (EVAs), language understanding is substantially challenging. First, t he users are infrequent callers who are unfamiliar with the expectations of a pre-designed conversation flow. Second, the users are paying customers of an enterprise who demand a reliable, consistent and efficient user experience when resolving their issues. In this work, we describe a general and robust framework for intent and entity extraction utilizing a hybrid of statistical and rule-based approaches. Our framework includes confidence modeling that incorporates information from all components in the SLU pipeline, a critical addition for EVAs to en- sure accuracy. Our focus is on creating accurate and scalable SLU that can be deployed rapidly for a large class of EVA applications with little need for human intervention.
How do people understand the meaning of the word small'' when used to describe a mosquito, a church, or a planet? While humans have a remarkable ability to form meanings by combining existing concepts, modeling this process is challenging. This paper addresses that challenge through CEREBRA (Context-dEpendent meaning REpresentations in the BRAin) neural network model. CEREBRA characterizes how word meanings dynamically adapt in the context of a sentence by decomposing sentence fMRI into words and words into embodied brain-based semantic features. It demonstrates that words in different contexts have different representations and the word meaning changes in a way that is meaningful to human subjects. CEREBRA's context-based representations can potentially be used to make NLP applications more human-like.
Abstract Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely spoken languages in the world, and yet there are few NLU datasets available for this language. The availability of high-quality evaluation datasets is a necessity for reliable assessment of the progress on different NLU tasks and domains. We introduce ParsiNLU, the first benchmark in Persian language that includes a range of language understanding tasks---reading comprehension, textual entailment, and so on. These datasets are collected in a multitude of ways, often involving manual annotations by native speakers. This results in over 14.5k new instances across 6 distinct NLU tasks. Additionally, we present the first results on state-of-the-art monolingual and multilingual pre-trained language models on this benchmark and compare them with human performance, which provides valuable insights into our ability to tackle natural language understanding challenges in Persian. We hope ParsiNLU fosters further research and advances in Persian language understanding.1
Natural Language Understanding (NLU) is an established component within a conversational AI or digital assistant system, and it is responsible for producing semantic understanding of a user request. We propose a scalable and automatic approach for im proving NLU in a large-scale conversational AI system by leveraging implicit user feedback, with an insight that user interaction data and dialog context have rich information embedded from which user satisfaction and intention can be inferred. In particular, we propose a domain-agnostic framework for curating new supervision data for improving NLU from live production traffic. With an extensive set of experiments, we show the results of applying the framework and improving NLU for a large-scale production system across 10 domains.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا