New community

Subscribe to the gold package and get unlimited access to Shamra Academy

A Semantic Approach for Improving Scene Understanding

مقارنة دلالية لتحسين فهم المشهد

862 0 11 0.0 ( 0 )

Download Cite

Added by Damascus University أطروحة دكتوراه

Publication date 2015

fields Informatics Engineering

and research's language is العربية

Authors مدحت الصوص( باحث ) - عمار خيربك( مشرف )

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

يستطيع البشر ادارك المشاهد المحيطة بهم خلال أجزاء من الثانية، على الرغم من اختلاف أنواع هذه المشاهد. يعتمد البشر في هذه العملية على معالجة المعلومات البصرية بسرعة فائقة. إضافة إلى ربطها مع مجموعة كبيرة من المعارف المسبقة. وهذا ما تفتقر إليه الحواسيب التي لم تتمكن بعد من الوصول إلى مستويات عالية في فهم المشاهد المحيطة بها. دأبت معظم الأبحاث التي تعمل ضمن مجال فهم المشاهد، على اختصار عملية فهم المشهد بتصنيفه ضمن مجموعة من التصنيفات المعرفة مسبقا (غابة، مدينة، حديقة)، باستخدام خوارزميات تصنيف او تعلم تلقائي، وهذا ما حد من وصولها إلى فهم دلالة المشهد على نحو عميق. كما قلل من قابلية استخدامها عملياً بسبب وجود مرحلة تدريب لهذه الخوارزميات. ولكن وعلى الرغم من قيام بعض الأبحاث بمحاولة الاستفادة من المعارف المخزنة بصيغة انطولوجيات للوصول إلى عملية فهم أعمق لدلالة المشهد. إلا أن هذه الأبحاث لم تتمكن سوى من العمل ضمن مجال محدد بسبب محدودية الأنطولوجيات المتوفرة حالياً. نحاول في هذه الأطروحة فهم صور المشاهد دون تحديد تصنيفات معرفة مسبقا لهذه الصور. لن نعتمد في عملية الفهم هذه على مجرد تصنيف لصور المشاهد، وانما سنعتمد إلى استخراج مفاهيم ضمنية عالية المستوى من صور المشهد, بالأعتماد على مفاهيم أولية مستخرجة منها. لا تعبر هذه المفاهيم الضمنية عن الأغراض الموجودة ضمن الصورة فحسب وأنما أيضا عن الأماكن والأحداث والأفعال الموجودة ضمن الصورة. للقيام بذلك، طورنا نظاما خاصا اسميناه ICES ويتالف من مرحلتين. تعتمد المرحلة الأولى على قاعدة صور غير متخصصة بمجال محدد، دون استخدام خوارزميات تصنيف او تعلم، وتقوم هذه المرحلة باستخراج مجموعة من المفاهيم الأولية من صورة المشهد. بينما تتالف المرحلة الثانية من خوارزمية مخصصة قمنا بتطويرها تحت اسم SMHITS لايجاد المفاهيم المترابطة دلاليا مع مجموعة المفاهيم الأولية, بالاعتماد على شبكة معارف شائعة وغنية دلالياً. أظهرت النتائج تفوق خوارزمية SMHITS على الخوارزمية المعتمدة حاليا في شبكة ConceptNet لاستخراج المفاهيم المرتبطة، وذلك من حيث الدقة والاستفادة من زيادة عدد المفاهيم, كما أظهرت الغنى الدلاي للمفاهيم المستخرجة من قبل ICES مقارنة بالأبحاث الأخرى, وقابليته للتوسع بسهولة.

People live in various environments, although they can understand scenes around them with just a glance. To do this, they depend on their high ability to effectively process visual data and connect it to wide pre-knowledge about what they are expected to see. This is not the case for computers, which can’t reach high levels of scene understanding until now. Most researches treat scene understanding as a usual classification problem, where they have just to classify scenes in predefined limited categories (forest, city, garden). They normally used classification or machine learning algorithms, which limit their ability to understand scenes and reduces their chances to be used in a practical way because of a required training phase of these algorithms. Some researches try to make use of knowledge in Ontologies to reach a high level scene understanding, but these researches are still limited to specific domains only. In this thesis we are trying to understand scene images without any pre-knowledge about their domain. We will not treat this problem as a normal classification problem; however we will extract high level concepts from scene images. These concepts will not only represent objects in the scene, but they will also reflect the places and events in the scene. To do this, we develop a novel algorithm named SMHITS. It depends on a semantically rich common sense knowledge base to extract associated concepts with a primitive group of concepts. To use SMHITS in scene understanding, we also develop a system named ICES. Instead of using a classification or machine learning algorithm, ICES depends on a large dataset of images that is independent of any scene domain. Results show the superiority of SMHITS comparing to current ConceptNet associated concepts extraction algorithm, as it has higher precision and can take advantage of expansion of its knowledge base. Results also show that ICES output concepts are semantically rich.

Artificial intelligence review:

Upgrade your account to view the content

Research summary

تتناول هذه الأطروحة مشكلة فهم المشاهد من خلال تطوير نظام جديد يعتمد على استخراج المفاهيم الضمنية من الصور بدلاً من تصنيفها ضمن تصنيفات محددة مسبقاً. تعتمد الأطروحة على تطوير خوارزمية جديدة تسمى SMHITS التي تعتمد على شبكة معارف شائعة لاستخراج المفاهيم المرتبطة دلالياً بمجموعة من المفاهيم الأولية المستخرجة من الصور. يتكون النظام المقترح، المسمى ICES، من مرحلتين: الأولى تعتمد على قاعدة صور غير متخصصة لاستخراج المفاهيم الأولية، والثانية تعتمد على خوارزمية SMHITS لاستخراج المفاهيم الضمنية. أظهرت النتائج تفوق خوارزمية SMHITS على الخوارزميات الحالية من حيث الدقة والغنى الدلالي للمفاهيم المستخرجة.

Critical review

على الرغم من أن الأطروحة تقدم حلاً مبتكراً لمشكلة فهم المشاهد، إلا أن هناك بعض النقاط التي يمكن تحسينها. أولاً، تعتمد الأطروحة بشكل كبير على قاعدة الصور المستخدمة، مما قد يحد من تطبيق النظام في مجالات أخرى تحتاج إلى قواعد صور مختلفة. ثانياً، لا تزال الخوارزمية تعتمد على شبكة معارف شائعة قد تحتوي على بعض الأخطاء أو التناقضات في العلاقات الدلالية. ثالثاً، يمكن تحسين النظام من خلال دمج تقنيات تعلم الآلة الحديثة مثل التعلم العميق لتحسين دقة استخراج المفاهيم الضمنية.

Questions related to the research

ما هي الخوارزمية الجديدة التي تم تطويرها في هذه الأطروحة؟

الخوارزمية الجديدة التي تم تطويرها تسمى SMHITS، وهي تعتمد على شبكة معارف شائعة لاستخراج المفاهيم المرتبطة دلالياً بمجموعة من المفاهيم الأولية المستخرجة من الصور.
ما هي المراحل التي يتكون منها نظام ICES؟

يتكون نظام ICES من مرحلتين: الأولى تعتمد على قاعدة صور غير متخصصة لاستخراج المفاهيم الأولية، والثانية تعتمد على خوارزمية SMHITS لاستخراج المفاهيم الضمنية.
ما هي النتائج التي أظهرتها خوارزمية SMHITS مقارنة بالخوارزميات الحالية؟

أظهرت النتائج تفوق خوارزمية SMHITS على الخوارزميات الحالية من حيث الدقة والغنى الدلالي للمفاهيم المستخرجة.
ما هي النقاط التي يمكن تحسينها في الأطروحة؟

يمكن تحسين الأطروحة من خلال تقليل الاعتماد على قاعدة الصور المستخدمة، تحسين دقة شبكة المعارف الشائعة، ودمج تقنيات تعلم الآلة الحديثة مثل التعلم العميق.

Keywords

فهم المشهد شبكات المعارف الشائعة المفاهيم المترابطة البيانات المترابطة

References used

L. Shapiro and G. C. Stockman, Computer Vision: Prentice Hall, 2001

. R. Davies, Machine Vision: Theory, Algorithms, Practicalities: Morgan Kaufmann Publishers Inc., 2004

. Szeliski, Computer Vision: Algorithms and Applications: Springer-Verlag New York, Inc., 2010.

B. Jiihne and H. Hauflecker, Computer Vision and Applications: A Guide for Students and Practitioners: Academic Press, San Diego, California, 2000.

N. Pears, Y. Liu, and P. Bunting, 3D Imaging, Analysis and Applications :Springer, 2012

A. Oliva, "Scene Perception," in the New Visual Neurosciences, E. J. S. Werner and L. M. Chalupa, Eds., ed: MIT Press, 2012.

A. Oliva, "Visual Scene Perception," Massachusetts Institute of Technology 2009.

rate research

OCID-Ref: A 3D Robotic Dataset With Embodied Language For Clutter Scene Grounding

245 - Association for Computation Linguistics 2021 مقالة

To effectively apply robots in working environments and assist humans, it is essential to develop and evaluate how visual grounding (VG) can affect machine performance on occluded objects. However, current VG works are limited in working environments , such as offices and warehouses, where objects are usually occluded due to space utilization issues. In our work, we propose a novel OCID-Ref dataset featuring a referring expression segmentation task with referring expressions of occluded objects. OCID-Ref consists of 305,694 referring expressions from 2,300 scenes with providing RGB image and point cloud inputs. To resolve challenging occlusion issues, we argue that it's crucial to take advantage of both 2D and 3D signals to resolve challenging occlusion issues. Our experimental results demonstrate the effectiveness of aggregating 2D and 3D signals but referring to occluded objects still remains challenging for the modern visual grounding systems. OCID-Ref is publicly available at https://github.com/lluma/OCID-Ref

embodied language language for clutter clutter scene grounding لغة مجسمة لغة الفوضى فوضى المشهد التأريض صناعة حمض الفوسفور المزيد..

A Hybrid Approach to Scalable and Robust Spoken Language Understanding in Enterprise Virtual Agents

381 - Association for Computation Linguistics 2021 مقالة

Spoken language understanding (SLU) extracts the intended mean- ing from a user utterance and is a critical component of conversational virtual agents. In enterprise virtual agents (EVAs), language understanding is substantially challenging. First, t he users are infrequent callers who are unfamiliar with the expectations of a pre-designed conversation flow. Second, the users are paying customers of an enterprise who demand a reliable, consistent and efficient user experience when resolving their issues. In this work, we describe a general and robust framework for intent and entity extraction utilizing a hybrid of statistical and rule-based approaches. Our framework includes confidence modeling that incorporates information from all components in the SLU pipeline, a critical addition for EVAs to en- sure accuracy. Our focus is on creating accurate and scalable SLU that can be deployed rapidly for a large class of EVA applications with little need for human intervention.

مساعد الصوت enterprise virtual agents virtual agents الوكلاء الافتراضي للمؤسسات الوكلاء الافتراضية صناعة حمض الفوسفور

Understanding the Semantic Space: How Word Meanings Dynamically Adapt in the Context of a Sentence

319 - Association for Computation Linguistics 2021 مقالة

How do people understand the meaning of the word small'' when used to describe a mosquito, a church, or a planet? While humans have a remarkable ability to form meanings by combining existing concepts, modeling this process is challenging. This paper addresses that challenge through CEREBRA (Context-dEpendent meaning REpresentations in the BRAin) neural network model. CEREBRA characterizes how word meanings dynamically adapt in the context of a sentence by decomposing sentence fMRI into words and words into embodied brain-based semantic features. It demonstrates that words in different contexts have different representations and the word meaning changes in a way that is meaningful to human subjects. CEREBRA's context-based representations can potentially be used to make NLP applications more human-like.

meanings dynamically adapt semantic space word meanings dynamically معاني التكيف ديناميكيا الفضاء الدلالي معاني كلمة ديناميكيا صناعة حمض الفوسفور المزيد..

ParsiNLU: A Suite of Language Understanding Challenges for Persian

564 - Association for Computation Linguistics 2021 مقالة

Abstract Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely spoken languages in the world, and yet there are few NLU datasets available for this language. The availability of high-quality evaluation datasets is a necessity for reliable assessment of the progress on different NLU tasks and domains. We introduce ParsiNLU, the first benchmark in Persian language that includes a range of language understanding tasks---reading comprehension, textual entailment, and so on. These datasets are collected in a multitude of ways, often involving manual annotations by native speakers. This results in over 14.5k new instances across 6 distinct NLU tasks. Additionally, we present the first results on state-of-the-art monolingual and multilingual pre-trained language models on this benchmark and compare them with human performance, which provides valuable insights into our ability to tackle natural language understanding challenges in Persian. We hope ParsiNLU fosters further research and advances in Persian language understanding.1

language understanding challenges persian language لغة فهم التحديات اللغة الفارسية صناعة حمض الفوسفور

A Scalable Framework for Learning From Implicit User Feedback to Improve Natural Language Understanding in Large-Scale Conversational AI Systems

261 - Association for Computation Linguistics 2021 مقالة

Natural Language Understanding (NLU) is an established component within a conversational AI or digital assistant system, and it is responsible for producing semantic understanding of a user request. We propose a scalable and automatic approach for im proving NLU in a large-scale conversational AI system by leveraging implicit user feedback, with an insight that user interaction data and dialog context have rich information embedded from which user satisfaction and intention can be inferred. In particular, we propose a domain-agnostic framework for curating new supervision data for improving NLU from live production traffic. With an extensive set of experiments, we show the results of applying the framework and improving NLU for a large-scale production system across 10 domains.

أدب البحث الطبي صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

A Semantic Approach for Improving Scene Understanding

مقارنة دلالية لتحسين فهم المشهد

Ask ChatGPT about the research

Read More

suggested questions