ترغب بنشر مسار تعليمي؟ اضغط هنا

GLADAS: Gesture Learning for Advanced Driver Assistance Systems

134   0   0.0 ( 0 )
 نشر من قبل Ethan Shaotran
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Human-computer interaction (HCI) is crucial for the safety of lives as autonomous vehicles (AVs) become commonplace. Yet, little effort has been put toward ensuring that AVs understand humans on the road. In this paper, we present GLADAS, a simulator-based research platform designed to teach AVs to understand pedestrian hand gestures. GLADAS supports the training, testing, and validation of deep learning-based self-driving car gesture recognition systems. We focus on gestures as they are a primordial (i.e, natural and common) way to interact with cars. To the best of our knowledge, GLADAS is the first system of its kind designed to provide an infrastructure for further research into human-AV interaction. We also develop a hand gesture recognition algorithm for self-driving cars, using GLADAS to evaluate its performance. Our results show that an AV understands human gestures 85.91% of the time, reinforcing the need for further research into human-AV interaction.



قيم البحث

اقرأ أيضاً

With the development of advanced communication technology, connected vehicles become increasingly popular in our transportation systems, which can conduct cooperative maneuvers with each other as well as road entities through vehicle-to-everything co mmunication. A lot of research interests have been drawn to other building blocks of a connected vehicle system, such as communication, planning, and control. However, less research studies were focused on the human-machine cooperation and interface, namely how to visualize the guidance information to the driver as an advanced driver-assistance system (ADAS). In this study, we propose an augmented reality (AR)-based ADAS, which visualizes the guidance information calculated cooperatively by multiple connected vehicles. An unsignalized intersection scenario is adopted as the use case of this system, where the driver can drive the connected vehicle crossing the intersection under the AR guidance, without any full stop at the intersection. A simulation environment is built in Unity game engine based on the road network of San Francisco, and human-in-the-loop (HITL) simulation is conducted to validate the effectiveness of our proposed system regarding travel time and energy consumption.
Computer Vision, either alone or combined with other technologies such as radar or Lidar, is one of the key technologies used in Advanced Driver Assistance Systems (ADAS). Its role understanding and analysing the driving scene is of great importance as it can be noted by the number of ADAS applications that use this technology. However, porting a vision algorithm to an embedded automotive system is still very challenging, as there must be a trade-off between several design requisites. Furthermore, there is not a standard implementation platform, so different alternatives have been proposed by both the scientific community and the industry. This paper aims to review the requisites and the different embedded implementation platforms that can be used for Computer Vision-based ADAS, with a critical analysis and an outlook to future trends.
Vision-based driver assistance systems is one of the rapidly growing research areas of ITS, due to various factors such as the increased level of safety requirements in automotive, computational power in embedded systems, and desire to get closer to autonomous driving. It is a cross disciplinary area encompassing specialised fields like computer vision, machine learning, robotic navigation, embedded systems, automotive electronics and safety critical software. In this paper, we survey the list of vision based advanced driver assistance systems with a consistent terminology and propose a taxonomy. We also propose an abstract model in an attempt to formalize a top-down view of application development to scale towards autonomous driving system.
Text-to-speech and co-speech gesture synthesis have until now been treated as separate areas by two different research communities, and applications merely stack the two technologies using a simple system-level pipeline. This can lead to modeling ine fficiencies and may introduce inconsistencies that limit the achievable naturalness. We propose to instead synthesize the two modalities in a single model, a new problem we call integrated speech and gesture synthesis (ISG). We also propose a set of models modified from state-of-the-art neural speech-synthesis engines to achieve this goal. We evaluate the models in three carefully-designed user studies, two of which evaluate the synthesized speech and gesture in isolation, plus a combined study that evaluates the models like they will be used in real-world applications -- speech and gesture presented together. The results show that participants rate one of the proposed integrated synthesis models as being as good as the state-of-the-art pipeline system we compare against, in all three tests. The model is able to achieve this with faster synthesis time and greatly reduced parameter count compared to the pipeline system, illustrating some of the potential benefits of treating speech and gesture synthesis together as a single, unified problem. Videos and code are available on our project page at https://swatsw.github.io/isg_icmi21/
Sensory substitution can help persons with perceptual deficits. In this work, we attempt to visualize audio with video. Our long-term goal is to create sound perception for hearing impaired people, for instance, to facilitate feedback for training de af speech. Different from existing models that translate between speech and text or text and images, we target an immediate and low-level translation that applies to generic environment sounds and human speech without delay. No canonical mapping is known for this artificial translation task. Our design is to translate from audio to video by compressing both into a common latent space with shared structure. Our core contribution is the development and evaluation of learned mappings that respect human perception limits and maximize user comfort by enforcing priors and combining strategies from unpaired image translation and disentanglement. We demonstrate qualitatively and quantitatively that our AudioViewer model maintains important audio features in the generated video and that generated videos of faces and numbers are well suited for visualizing high-dimensional audio features since they can easily be parsed by humans to match and distinguish between sounds, words, and speakers.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا