No Arabic abstract
Moving towards autonomy, unmanned vehicles rely heavily on state-of-the-art collision avoidance systems (CAS). However, the detection of obstacles especially during night-time is still a challenging task since the lighting conditions are not sufficient for traditional cameras to function properly. Therefore, we exploit the powerful attributes of event-based cameras to perform obstacle detection in low lighting conditions. Event cameras trigger events asynchronously at high output temporal rate with high dynamic range of up to 120 $dB$. The algorithm filters background activity noise and extracts objects using robust Hough transform technique. The depth of each detected object is computed by triangulating 2D features extracted utilising LC-Harris. Finally, asynchronous adaptive collision avoidance (AACA) algorithm is applied for effective avoidance. Qualitative evaluation is compared using event-camera and traditional camera.
Event-based cameras are vision devices that transmit only brightness changes with low latency and ultra-low power consumption. Such characteristics make event-based cameras attractive in the field of localization and object tracking in resource-constrained systems. Since the number of generated events in such cameras is huge, the selection and filtering of the incoming events are beneficial from both increasing the accuracy of the features and reducing the computational load. In this paper, we present an algorithm to detect asynchronous corners from a stream of events in real-time on embedded systems. The algorithm is called the Three Layer Filtering-Harris or TLF-Harris algorithm. The algorithm is based on an events filtering strategy whose purpose is 1) to increase the accuracy by deliberately eliminating some incoming events, i.e., noise, and 2) to improve the real-time performance of the system, i.e., preserving a constant throughput in terms of input events per second, by discarding unnecessary events with a limited accuracy loss. An approximation of the Harris algorithm, in turn, is used to exploit its high-quality detection capability with a low-complexity implementation to enable seamless real-time performance on embedded computing platforms. The proposed algorithm is capable of selecting the best corner candidate among neighbors and achieves an average execution time savings of 59 % compared with the conventional Harris score. Moreover, our approach outperforms the competing methods, such as eFAST, eHarris, and FA-Harris, in terms of real-time performance, and surpasses Arc* in terms of accuracy.
Deep reinforcement learning has achieved great success in laser-based collision avoidance work because the laser can sense accurate depth information without too much redundant data, which can maintain the robustness of the algorithm when it is migrated from the simulation environment to the real world. However, high-cost laser devices are not only difficult to apply on a large scale but also have poor robustness to irregular objects, e.g., tables, chairs, shelves, etc. In this paper, we propose a vision-based collision avoidance framework to solve the challenging problem. Our method attempts to estimate the depth and incorporate the semantic information from RGB data to obtain a new form of data, pseudo-laser data, which combines the advantages of visual information and laser information. Compared to traditional laser data that only contains the one-dimensional distance information captured at a certain height, our proposed pseudo-laser data encodes the depth information and semantic information within the image, which makes our method more effective for irregular obstacles. Besides, we adaptively add noise to the laser data during the training stage to increase the robustness of our model in the real world, due to the estimated depth information is not accurate. Experimental results show that our framework achieves state-of-the-art performance in several unseen virtual and real-world scenarios.
We explore the feasibility of using current generation, off-the-shelf, indium gallium arsenide (InGaAs) near-infrared (NIR) detectors for astronomical observations. Light-weight InGaAs cameras, developed for the night vision industry and operated at or near room temperature, enable cost-effective new paths for observing the NIR sky, particularly when paired with small telescopes. We have tested an InGaAs camera in the laboratory and on the sky using 12 and 18-inch telescopes. The camera is a small-format, 320x240 pixels of 40$mu$m pitch, Short Wave Infra-Red (SWIR) device from Sensors Unlimited. Although the device exhibits a room-temperature dark current of $5.7 times 10^4$ $e^-s^{-1}$ per pixel, we find observations of bright sources and low-positional-resolution observations of faint sources remain feasible. We can record unsaturated images of bright ($J=3.9$) sources due to the large pixel well-depth and resulting high dynamic range. When mounted on an 18-inch telescope, the sensor is capable of achieving milli-magnitude precision for sources brighter than $J=8$. Faint sources can be sky-background-limited with modest thermoelectric cooling. We can detect faint sources ($J=16.4$ at $10sigma$) in a one-minute exposure when mounted to an 18-inch telescope. From laboratory testing, we characterize the noise properties, sensitivity, and stability of the camera in a variety of different operational modes and at different operating temperatures. Through sky testing, we show that the (unfiltered) camera can enable precise and accurate photometry, operating like a filtered $J$-band detector, with small color corrections. In the course of our sky testing, we successfully measured sub-percent flux variations in an exoplanet transit. We have demonstrated an ability to detect transient sources in dense fields using image subtraction of existing reference catalogs.
When a toddler is presented a new toy, their instinctual behaviour is to pick it upand inspect it with their hand and eyes in tandem, clearly searching over its surface to properly understand what they are playing with. At any instance here, touch provides high fidelity localized information while vision provides complementary global context. However, in 3D shape reconstruction, the complementary fusion of visual and haptic modalities remains largely unexplored. In this paper, we study this problem and present an effective chart-based approach to multi-modal shape understanding which encourages a similar fusion vision and touch information.To do so, we introduce a dataset of simulated touch and vision signals from the interaction between a robotic hand and a large array of 3D objects. Our results show that (1) leveraging both vision and touch signals consistently improves single-modality baselines; (2) our approach outperforms alternative modality fusion methods and strongly benefits from the proposed chart-based structure; (3) there construction quality increases with the number of grasps provided; and (4) the touch information not only enhances the reconstruction at the touch site but also extrapolates to its local neighborhood.
In recent years, camera-based localization has been widely used for robotic applications, and most proposed algorithms rely on local features extracted from recorded images. For better performance, the features used for open-loop localization are required to be short-term globally static, and the ones used for re-localization or loop closure detection need to be long-term static. Therefore, the motion attribute of a local feature point could be exploited to improve localization performance, e.g., the feature points extracted from moving persons or vehicles can be excluded from these systems due to their unsteadiness. In this paper, we design a fully convolutional network (FCN), named MD-Net, to perform motion attribute estimation and feature description simultaneously. MD-Net has a shared backbone network to extract features from the input image and two network branches to complete each sub-task. With MD-Net, we can obtain the motion attribute while avoiding increasing much more computation. Experimental results demonstrate that the proposed method can learn distinct local feature descriptor along with motion attribute only using an FCN, by outperforming competing methods by a wide margin. We also show that the proposed algorithm can be integrated into a vision-based localization algorithm to improve estimation accuracy significantly.