ﻻ يوجد ملخص باللغة العربية
Currently, the visually impaired rely on either a sighted human, guide dog, or white cane to safely navigate. However, the training of guide dogs is extremely expensive, and canes cannot provide essential information regarding the color of traffic lights and direction of crosswalks. In this paper, we propose a deep learning based solution that provides information regarding the traffic light mode and the position of the zebra crossing. Previous solutions that utilize machine learning only provide one piece of information and are mostly binary: only detecting red or green lights. The proposed convolutional neural network, LYTNet, is designed for comprehensiveness, accuracy, and computational efficiency. LYTNet delivers both of the two most important pieces of information for the visually impaired to cross the road. We provide five classes of pedestrian traffic lights rather than the commonly seen three or four, and a direction vector representing the midline of the zebra crossing that is converted from the 2D image plane to real-world positions. We created our own dataset of pedestrian traffic lights containing over 5000 photos taken at hundreds of intersections in Shanghai. The experiments carried out achieve a classification accuracy of 94%, average angle error of 6.35 degrees, with a frame rate of 20 frames per second when testing the network on an iPhone 7 with additional post-processing steps.
Gadgets helping the disabled, especially blind that are in least accessibility of information, use acoustic methods that can cause stress to ear and infringe users privacy. Even if some project uses embedded Radio Frequency Identification (RFID) into
Pedestrian trajectory prediction in urban scenarios is essential for automated driving. This task is challenging because the behavior of pedestrians is influenced by both their own history paths and the interactions with others. Previous research mod
Modern mobile neural networks with a reduced number of weights and parameters do a good job with image classification tasks, but even they may be too complex to be implemented in an FPGA for video processing tasks. The article proposes neural network
In this work we present a new efficient approach to Human Action Recognition called Video Transformer Network (VTN). It leverages the latest advances in Computer Vision and Natural Language Processing and applies them to video understanding. The prop
A one-way {em street} of width M is modeled as a set of M parallel one-dimensional TASEPs. The intersection of two perpendicular streets is a square lattice of size M times M. We consider hard core particles entering each street with an injection pro