No Arabic abstract
We describe an open-source simulator that creates sensor irradiance and sensor images of typical automotive scenes in urban settings. The purpose of the system is to support camera design and testing for automotive applications. The user can specify scene parameters (e.g., scene type, road type, traffic density, time of day) to assemble a large number of random scenes from graphics assets stored in a database. The sensor irradiance is generated using quantitative computer graphics methods, and the sensor images are created using image systems sensor simulation. The synthetic sensor images have pixel level annotations; hence, they can be used to train and evaluate neural networks for imaging tasks, such as object detection and classification. The end-to-end simulation system supports quantitative assessment, from scene to camera to network accuracy, for automotive applications.
Accurate vehicle localization is a crucial step towards building effective Vehicle-to-Vehicle networks and automotive applications. Yet standard grade GPS data, such as that provided by mobile phones, is often noisy and exhibits significant localization errors in many urban areas. Approaches for accurate localization from imagery often rely on structure-based techniques, and thus are limited in scale and are expensive to compute. In this paper, we present a scalable visual localization approach geared for real-time performance. We propose a hybrid coarse-to-fine approach that leverages visual and GPS location cues. Our solution uses a self-supervised approach to learn a compact road image representation. This representation enables efficient visual retrieval and provides coarse localization cues, which are fused with vehicle ego-motion to obtain high accuracy location estimates. As a benchmark to evaluate the performance of our visual localization approach, we introduce a new large-scale driving dataset based on video and GPS data obtained from a large-scale network of connected dash-cams. Our experiments confirm that our approach is highly effective in challenging urban environments, reducing localization error by an order of magnitude.
The automotive industry is being transformed by technologies, applications and services ranging from sensors to big data analytics and to artificial intelligence. In this paper, we present our multidisciplinary initiative of creating a publicly available dataset to facilitate the visual-related marketing research and applications in automotive industry such as automotive exterior design, consumer analytics and sales modelling. We are motivated by the fact that there is growing interest in product aesthetics but there is no large-scale dataset available that covers a wide range of variables and information. We summarise the common issues faced by marketing researchers and computer scientists through a user survey study, and design our dataset to alleviate these issues. Our dataset contains 1.4 million images from 899 car models as well as their corresponding car model specification and sales information over more than ten years in the UK market. To the best of our knowledge, this is the very first large-scale automotive dataset which contains images, text and sales information from multiple sources over a long period of time. We describe the detailed data structure and the preparation steps, which we believe has the methodological contribution to the multi-source data fusion and sharing. In addition, we discuss three dataset application examples to illustrate the value of our dataset.
Neural networks have become increasingly prevalent within the geosciences, although a common limitation of their usage has been a lack of methods to interpret what the networks learn and how they make decisions. As such, neural networks have often been used within the geosciences to most accurately identify a desired output given a set of inputs, with the interpretation of what the network learns used as a secondary metric to ensure the network is making the right decision for the right reason. Neural network interpretation techniques have become more advanced in recent years, however, and we therefore propose that the ultimate objective of using a neural network can also be the interpretation of what the network has learned rather than the output itself. We show that the interpretation of neural networks can enable the discovery of scientifically meaningful connections within geoscientific data. In particular, we use two methods for neural network interpretation called backwards optimization and layerwise relevance propagation, both of which project the decision pathways of a network back onto the original input dimensions. To the best of our knowledge, LRP has not yet been applied to geoscientific research, and we believe it has great potential in this area. We show how these interpretation techniques can be used to reliably infer scientifically meaningful information from neural networks by applying them to common climate patterns. These results suggest that combining interpretable neural networks with novel scientific hypotheses will open the door to many new avenues in neural network-related geoscience research.
Generative adversarial networks (GANs) have demonstrated great success in generating various visual content. However, images generated by existing GANs are often of attributes (e.g., smiling expression) learned from one image domain. As a result, generating images of multiple attributes requires many real samples possessing multiple attributes which are very resource expensive to be collected. In this paper, we propose a novel GAN, namely IntersectGAN, to learn multiple attributes from different image domains through an intersecting architecture. For example, given two image domains $X_1$ and $X_2$ with certain attributes, the intersection $X_1 cap X_2$ denotes a new domain where images possess the attributes from both $X_1$ and $X_2$ domains. The proposed IntersectGAN consists of two discriminators $D_1$ and $D_2$ to distinguish between generated and real samples of different domains, and three generators where the intersection generator is trained against both discriminators. And an overall adversarial loss function is defined over three generators. As a result, our proposed IntersectGAN can be trained on multiple domains of which each presents one specific attribute, and eventually eliminates the need of real sample images simultaneously possessing multiple attributes. By using the CelebFaces Attributes dataset, our proposed IntersectGAN is able to produce high quality face images possessing multiple attributes (e.g., a face with black hair and a smiling expression). Both qualitative and quantitative evaluations are conducted to compare our proposed IntersectGAN with other baseline methods. Besides, several different applications of IntersectGAN have been explored with promising results.
Superpixel segmentation has recently seen important progress benefiting from the advances in differentiable deep learning. However, the very high-resolution superpixel segmentation still remains challenging due to the expensive memory and computation cost, making the current advanced superpixel networks fail to process. In this paper, we devise Patch Calibration Networks (PCNet), aiming to efficiently and accurately implement high-resolution superpixel segmentation. PCNet follows the principle of producing high-resolution output from low-resolution input for saving GPU memory and relieving computation cost. To recall the fine details destroyed by the down-sampling operation, we propose a novel Decoupled Patch Calibration (DPC) branch for collaboratively augment the main superpixel generation branch. In particular, DPC takes a local patch from the high-resolution images and dynamically generates a binary mask to impose the network to focus on region boundaries. By sharing the parameters of DPC and main branches, the fine-detailed knowledge learned from high-resolution patches will be transferred to help calibrate the destroyed information. To the best of our knowledge, we make the first attempt to consider the deep-learning-based superpixel generation for high-resolution cases. To facilitate this research, we build evaluation benchmarks from two public datasets and one new constructed one, covering a wide range of diversities from fine-grained human parts to cityscapes. Extensive experiments demonstrate that our PCNet can not only perform favorably against the state-of-the-arts in the quantitative results but also improve the resolution upper bound from 3K to 5K on 1080Ti GPUs.