No Arabic abstract
Simultaneous Localization and Mapping (SLAM) is a critical task for autonomous navigation. However, due to the computational complexity of SLAM algorithms, it is very difficult to achieve real-time implementation on low-power platforms.We propose an energy efficient architecture for real-time ORB (Oriented-FAST and Rotated- BRIEF) based visual SLAM system by accelerating the most time consuming stages of feature extraction and matching on FPGA platform.Moreover, the original ORB descriptor pattern is reformed as a rotational symmetric manner which is much more hardware friendly. Optimizations including rescheduling and parallelizing are further utilized to improve the throughput and reduce the memory footprint. Compared with Intel i7 and ARM Cortex-A9 CPUs on TUM dataset, our FPGA realization achieves up to 3X and 31X frame rate improvement, as well as up to 71X and 25X energy efficiency improvement, respectively.
This paper presents a field-programmable gate array (FPGA) design of a segmentation algorithm based on convolutional neural network (CNN) that can process light detection and ranging (LiDAR) data in real-time. For autonomous vehicles, drivable region segmentation is an essential step that sets up the static constraints for planning tasks. Traditional drivable region segmentation algorithms are mostly developed on camera data, so their performance is susceptible to the light conditions and the qualities of road markings. LiDAR sensors can obtain the 3D geometry information of the vehicle surroundings with high precision. However, it is a computational challenge to process a large amount of LiDAR data in real-time. In this paper, a convolutional neural network model is proposed and trained to perform semantic segmentation using data from the LiDAR sensor. An efficient hardware architecture is proposed and implemented on an FPGA that can process each LiDAR scan in 17.59 ms, which is much faster than the previous works. Evaluated using Ford and KITTI road detection benchmarks, the proposed solution achieves both high accuracy in performance and real-time processing in speed.
In this paper, a real-time signal processing frame-work based on a 60 GHz frequency-modulated continuous wave (FMCW) radar system to recognize gestures is proposed. In order to improve the robustness of the radar-based gesture recognition system, the proposed framework extracts a comprehensive hand profile, including range, Doppler, azimuth and elevation, over multiple measurement-cycles and encodes them into a feature cube. Rather than feeding the range-Doppler spectrum sequence into a deep convolutional neural network (CNN) connected with recurrent neural networks, the proposed framework takes the aforementioned feature cube as input of a shallow CNN for gesture recognition to reduce the computational complexity. In addition, we develop a hand activity detection (HAD) algorithm to automatize the detection of gestures in real-time case. The proposed HAD can capture the time-stamp at which a gesture finishes and feeds the hand profile of all the relevant measurement-cycles before this time-stamp into the CNN with low latency. Since the proposed framework is able to detect and classify gestures at limited computational cost, it could be deployed in an edge-computing platform for real-time applications, whose performance is notedly inferior to a state-of-the-art personal computer. The experimental results show that the proposed framework has the capability of classifying 12 gestures in real-time with a high F1-score.
Video enhancement is a challenging problem, more than that of stills, mainly due to high computational cost, larger data volumes and the difficulty of achieving consistency in the spatio-temporal domain. In practice, these challenges are often coupled with the lack of example pairs, which inhibits the application of supervised learning strategies. To address these challenges, we propose an efficient adversarial video enhancement framework that learns directly from unpaired video examples. In particular, our framework introduces new recurrent cells that consist of interleaved local and global modules for implicit integration of spatial and temporal information. The proposed design allows our recurrent cells to efficiently propagate spatio-temporal information across frames and reduces the need for high complexity networks. Our setting enables learning from unpaired videos in a cyclic adversarial manner, where the proposed recurrent units are employed in all architectures. Efficient training is accomplished by introducing one single discriminator that learns the joint distribution of source and target domain simultaneously. The enhancement results demonstrate clear superiority of the proposed video enhancer over the state-of-the-art methods, in all terms of visual quality, quantitative metrics, and inference speed. Notably, our video enhancer is capable of enhancing over 35 frames per second of FullHD video (1080x1920).
In the real-life environments, due to the sudden appearance of windows, lights, and objects blocking the light source, the visual SLAM system can easily capture the low-contrast images caused by over-exposure or over-darkness. At this time, the direct method of estimating camera motion based on pixel luminance information is infeasible, and it is often difficult to find enough valid feature points without image processing. This paper proposed HE-SLAM, a new method combining histogram equalization and ORB feature extraction, which can be robust in more scenes, especially in stages with low-contrast images. Because HE-SLAM uses histogram equalization to improve the contrast of images, it can extract enough valid feature points in low-contrast images for subsequent feature matching, keyframe selection, bundle adjustment, and loop closure detection. The proposed HE-SLAM has been tested on the popular datasets (such as KITTI and EuRoc), and the real-time performance and robustness of the system are demonstrated by comparing system runtime and the mean square root error (RMSE) of absolute trajectory error (ATE) with state-of-the-art methods like ORB-SLAM2.
With the increasing awareness of privacy protection and data fragmentation problem, federated learning has been emerging as a new paradigm of machine learning. Federated learning tends to utilize various privacy preserving mechanisms to protect the transferred intermediate data, among which homomorphic encryption strikes a balance between security and ease of utilization. However, the complicated operations and large operands impose significant overhead on federated learning. Maintaining accuracy and security more efficiently has been a key problem of federated learning. In this work, we investigate a hardware solution, and design an FPGA-based homomorphic encryption framework, aiming to accelerate the training phase in federated learning. The root complexity lies in searching for a compact architecture for the core operation of homomorphic encryption, to suit the requirement of federated learning about high encryption throughput and flexibility of configuration. Our framework implements the representative Paillier homomorphic cryptosystem with high level synthesis for flexibility and portability, with careful optimization on the modular multiplication operation in terms of processing clock cycle, resource usage and clock frequency. Our accelerator achieves a near-optimal execution clock cycle, with a better DSP-efficiency than existing designs, and reduces the encryption time by up to 71% during training process of various federated learning models.