No Arabic abstract
This paper presents a field-programmable gate array (FPGA) design of a segmentation algorithm based on convolutional neural network (CNN) that can process light detection and ranging (LiDAR) data in real-time. For autonomous vehicles, drivable region segmentation is an essential step that sets up the static constraints for planning tasks. Traditional drivable region segmentation algorithms are mostly developed on camera data, so their performance is susceptible to the light conditions and the qualities of road markings. LiDAR sensors can obtain the 3D geometry information of the vehicle surroundings with high precision. However, it is a computational challenge to process a large amount of LiDAR data in real-time. In this paper, a convolutional neural network model is proposed and trained to perform semantic segmentation using data from the LiDAR sensor. An efficient hardware architecture is proposed and implemented on an FPGA that can process each LiDAR scan in 17.59 ms, which is much faster than the previous works. Evaluated using Ford and KITTI road detection benchmarks, the proposed solution achieves both high accuracy in performance and real-time processing in speed.
Simultaneous Localization and Mapping (SLAM) is a critical task for autonomous navigation. However, due to the computational complexity of SLAM algorithms, it is very difficult to achieve real-time implementation on low-power platforms.We propose an energy efficient architecture for real-time ORB (Oriented-FAST and Rotated- BRIEF) based visual SLAM system by accelerating the most time consuming stages of feature extraction and matching on FPGA platform.Moreover, the original ORB descriptor pattern is reformed as a rotational symmetric manner which is much more hardware friendly. Optimizations including rescheduling and parallelizing are further utilized to improve the throughput and reduce the memory footprint. Compared with Intel i7 and ARM Cortex-A9 CPUs on TUM dataset, our FPGA realization achieves up to 3X and 31X frame rate improvement, as well as up to 71X and 25X energy efficiency improvement, respectively.
Small animal Positron Emission Tomography (PET) is dedicated to small animal imaging. Animals used in experiments, such as rats and monkeys, are often much smaller than human bodies, which requires higher position and energy precision of the PET imaging system. Besides, Flexibility, high efficiency are also the major demands of a practical PET system. These requires a high-quality analog front-end and a digital signal processing logic with high efficiency and compatibility of multiple data processing modes. The digital signal processing logic of the small animal PET system presented in this paper implements 32-channel signal processing in a single Xilinx Artix-7 family of Field-Programmable Gate Array (FPGA). The logic is designed to support three online modes which are regular package mode, flood map and energy spectrum histogram. Several functions are integrated, including two-dimensional (2D) raw position calculation, crystal identification, events filtering, etc. Besides, a series of online corrections are also integrated, such as photon peak correction to 511 keV and timing offset correction with crystal granularity. A Gigabit Ethernet interface is utilized for data transfer, Look-Up Tables (LUTs) configuration and commands issuing. The pipe-line logic processes the signals at 125 MHz with a rate of 1,000,000 events/s. A series of initial tests are conducted. The results indicate that the digital processing logic achieves the expectations.
Obtaining highly accurate depth from stereo images in real time has many applications across computer vision and robotics, but in some contexts, upper bounds on power consumption constrain the feasible hardware to embedded platforms such as FPGAs. Whilst various stereo algorithms have been deployed on these platforms, usually cut down to better match the embedded architecture, certain key parts of the more advanced algorithms, e.g. those that rely on unpredictable access to memory or are highly iterative in nature, are difficult to deploy efficiently on FPGAs, and thus the depth quality that can be achieved is limited. In this paper, we leverage a FPGA-CPU chip to propose a novel, sophisticated, stereo approach that combines the best features of SGM and ELAS-based methods to compute highly accurate dense depth in real time. Our approach achieves an 8.7% error rate on the challenging KITTI 2015 dataset at over 50 FPS, with a power consumption of only 5W.
A real-time ranging lidar with 0.1 Mega Hertz update rate and few-micrometer resolution incorporating dispersive Fourier transformation and instantaneous microwave frequency measurement is proposed and demonstrated. As time-stretched femtosecond laser pulse passing through an all-fiber Mach-Zehnder Interferometer, where the detection light beam is inserted into the optical path of one arm, the displacement is encoded to the frequency variation of the temporal interferogram. To deal with the challenges in storage and real-time processing of the microwave pulse generated on a photodetector, we turn to all-optical signal processing. A carrier wave is modulated by the time-domain interferogram using an intensity modulator. After that, the frequency variation of the microwave pulse is uploaded to the first order sidebands. Finally, the frequency shift of the sidebands is turned into transmission change through a symmetric-locked frequency discriminator. In experiment, A real-time ranging system with adjustable dynamic range and detection sensitivity is realized by incorporating a programmable optical filter. Standard deviation of 7.64 {mu}m, overall mean error of 19.10 {mu}m over 15 mm detection range and standard deviation of 37.73 {mu}m, overall mean error of 36.63 {mu}m over 45 mm detection range are obtained respectively.
Modern mobile neural networks with a reduced number of weights and parameters do a good job with image classification tasks, but even they may be too complex to be implemented in an FPGA for video processing tasks. The article proposes neural network architecture for the practical task of recognizing images from a camera, which has several advantages in terms of speed. This is achieved by reducing the number of weights, moving from a floating-point to a fixed-point arithmetic, and due to a number of hardware-level optimizations associated with storing weights in blocks, a shift register, and an adjustable number of convolutional blocks that work in parallel. The article also proposed methods for adapting the existing data set for solving a different task. As the experiments showed, the proposed neural network copes well with real-time video processing even on the cheap FPGAs.