بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Trans4Trans: Efficient Transformer for Transparent Object Segmentation to Help Visually Impaired People Navigate in the Real World

88 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Kailun Yang

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Jiaming Zhang - Kailun Yang - Angela Constantinescu

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Common fully glazed facades and transparent objects present architectural barriers and impede the mobility of people with low vision or blindness, for instance, a path detected behind a glass door is inaccessible unless it is correctly perceived and reacted. However, segmenting these safety-critical objects is rarely covered by conventional assistive technologies. To tackle this issue, we construct a wearable system with a novel dual-head Transformer for Transparency (Trans4Trans) model, which is capable of segmenting general and transparent objects and performing real-time wayfinding to assist people walking alone more safely. Especially, both decoders created by our proposed Transformer Parsing Module (TPM) enable effective joint learning from different datasets. Besides, the efficient Trans4Trans model composed of symmetric transformer-based encoder and decoder, requires little computational expenses and is readily deployed on portable GPUs. Our Trans4Trans model outperforms state-of-the-art methods on the test sets of Stanford2D3D and Trans10K-v2 datasets and obtains mIoU of 45.13% and 75.14%, respectively. Through various pre-tests and a user study conducted in indoor and outdoor scenarios, the usability and reliability of our assistive system have been extensively verified.

قيم البحث

398 - Jiaming Zhang , Kailun Yang , Angela Constantinescu 2021

Transparent objects, such as glass walls and doors, constitute architectural obstacles hindering the mobility of people with low vision or blindness. For instance, the open space behind glass doors is inaccessible, unless it is correctly perceived an d interacted with. However, traditional assistive technologies rarely cover the segmentation of these safety-critical transparent objects. In this paper, we build a wearable system with a novel dual-head Transformer for Transparency (Trans4Trans) perception model, which can segment general- and transparent objects. The two dense segmentation results are further combined with depth information in the system to help users navigate safely and assist them to negotiate transparent obstacles. We propose a lightweight Transformer Parsing Module (TPM) to perform multi-scale feature interpretation in the transformer-based decoder. Benefiting from TPM, the double decoders can perform joint learning from corresponding datasets to pursue robustness, meanwhile maintain efficiency on a portable GPU, with negligible calculation increase. The entire Trans4Trans model is constructed in a symmetrical encoder-decoder architecture, which outperforms state-of-the-art methods on the test sets of Stanford2D3D and Trans10K-v2 datasets, obtaining mIoU of 45.13% and 75.14%, respectively. Through a user study and various pre-tests conducted in indoor and outdoor scenes, the usability and reliability of our assistive system have been extensively verified. Meanwhile, the Tran4Trans model has outstanding performances on driving scene datasets. On Cityscapes, ACDC, and DADA-seg datasets corresponding to common environments, adverse weather, and traffic accident scenarios, mIoU scores of 81.5%, 76.3%, and 39.2% are obtained, demonstrating its high efficiency and robustness for real-world transportation applications.

الرؤية الحاسوبية وتمييز الأنماط تفاعل الإنسان والحاسوب علم الروبوتات

Flying Guide Dog: Walkable Path Discovery for the Visually Impaired Utilizing Drones and Transformer-based Semantic Segmentation

113 - Haobin Tan , Chang Chen , Xinyu Luo 2021

Lacking the ability to sense ambient environments effectively, blind and visually impaired people (BVIP) face difficulty in walking outdoors, especially in urban areas. Therefore, tools for assisting BVIP are of great importance. In this paper, we pr opose a novel flying guide dog prototype for BVIP assistance using drone and street view semantic segmentation. Based on the walkable areas extracted from the segmentation prediction, the drone can adjust its movement automatically and thus lead the user to walk along the walkable path. By recognizing the color of pedestrian traffic lights, our prototype can help the user to cross a street safely. Furthermore, we introduce a new dataset named Pedestrian and Vehicle Traffic Lights (PVTL), which is dedicated to traffic light recognition. The result of our user study in real-world scenarios shows that our prototype is effective and easy to use, providing new insight into BVIP assistance.

الرؤية الحاسوبية وتمييز الأنماط تفاعل الإنسان والحاسوب علم الروبوتات

HIDA: Towards Holistic Indoor Understanding for the Visually Impaired via Semantic Instance Segmentation with a Wearable Solid-State LiDAR Sensor

162 - Huayao Liu , Ruiping Liu , Kailun Yang 2021

Independently exploring unknown spaces or finding objects in an indoor environment is a daily but challenging task for visually impaired people. However, common 2D assistive systems lack depth relationships between various objects, resulting in diffi culty to obtain accurate spatial layout and relative positions of objects. To tackle these issues, we propose HIDA, a lightweight assistive system based on 3D point cloud instance segmentation with a solid-state LiDAR sensor, for holistic indoor detection and avoidance. Our entire system consists of three hardware components, two interactive functions~(obstacle avoidance and object finding) and a voice user interface. Based on voice guidance, the point cloud from the most recent state of the changing indoor environment is captured through an on-site scanning performed by the user. In addition, we design a point cloud segmentation model with dual lightweight decoders for semantic and offset predictions, which satisfies the efficiency of the whole system. After the 3D instance segmentation, we post-process the segmented point cloud by removing outliers and projecting all points onto a top-view 2D map representation. The system integrates the information above and interacts with users intuitively by acoustic feedback. The proposed 3D instance segmentation model has achieved state-of-the-art performance on ScanNet v2 dataset. Comprehensive field tests with various tasks in a user study verify the usability and effectiveness of our system for assisting visually impaired people in holistic indoor understanding, obstacle avoidance and object search.

الرؤية الحاسوبية وتمييز الأنماط تفاعل الإنسان والحاسوب علم الروبوتات

Learning to Visually Navigate in Photorealistic Environments Without any Supervision

119 - Lina Mezghani , Sainbayar Sukhbaatar , Arthur Szlam 2020

Learning to navigate in a realistic setting where an agent must rely solely on visual inputs is a challenging task, in part because the lack of position information makes it difficult to provide supervision during training. In this paper, we introduc e a novel approach for learning to navigate from image inputs without external supervision or reward. Our approach consists of three stages: learning a good representation of first-person views, then learning to explore using memory, and finally learning to navigate by setting its own goals. The model is trained with intrinsic rewards only so that it can be applied to any environment with image observations. We show the benefits of our approach by training an agent to navigate challenging photo-realistic environments from the Gibson dataset with RGB inputs only.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

A Real-time Localization System Using RFID for Visually Impaired

383 - Tae Qinghui , Muhammad Yasir Malik , Youngjee Hong 2011

Gadgets helping the disabled, especially blind that are in least accessibility of information, use acoustic methods that can cause stress to ear and infringe users privacy. Even if some project uses embedded Radio Frequency Identification (RFID) into the sidewalk for blinds free walking, the tag memory design is not specified for buildings and road conditions. This paper suggested allocation scheme of RFID tag referring to EPCglobal SGLN, tactile method for conveying information, and use of lithium battery as power source with solar cells as an alternative. Results have shown independent mobility, accidents prevention, stress relief and satisfied factors in terms of cost and human usability.

أنظمة متعددة العملاء بنية الشبكات والإنترنت

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الجامعة الأميركية في بيروت

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Trans4Trans: Efficient Transformer for Transparent Object Segmentation to Help Visually Impaired People Navigate in the Real World

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً