Convolutional Patch Networks with Spatial Prior for Road Detection and Urban Scene Understanding

467 0 0.0 ( 0 )

Download Cite

Added by Erik Rodner

Publication date 2015

fields Informatics Engineering

and research's language is English

Authors Clemens-Alexander Brust - Sven Sickert - Marcel Simon

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Classifying single image patches is important in many different applications, such as road detection or scene understanding. In this paper, we present convolutional patch networks, which are convolutional networks learned to distinguish different image patches and which can be used for pixel-wise labeling. We also show how to incorporate spatial information of the patch as an input to the network, which allows for learning spatial priors for certain categories jointly with an appearance model. In particular, we focus on road detection and urban scene understanding, two application areas where we are able to achieve state-of-the-art results on the KITTI as well as on the LabelMeFacade dataset. Furthermore, our paper offers a guideline for people working in the area and desperately wandering through all the painstaking details that render training CNs on image patches extremely difficult.

rate research

Quantifying spatial homogeneity of urban road networks via graph neural networks

159 - Jiawei Xue , Nan Jiang , Senwei Liang 2021

The spatial homogeneity of an urban road network (URN) measures whether each distinct component is analogous to the whole network and can serve as a quantitative manner bridging network structure and dynamics. However, given the complexity of cities, it is challenging to quantify spatial homogeneity simply based on conventional network statistics. In this work, we use Graph Neural Networks to model the 11,790 URN samples across 30 cities worldwide and use its predictability to define the spatial homogeneity. The proposed measurement can be viewed as a non-linear integration of multiple geometric properties, such as degree, betweenness, road network type, and a strong indicator of mixed socio-economic events, such as GDP and population growth. City clusters derived from transferring spatial homogeneity can be interpreted well by continental urbanization histories. We expect this novel metric supports various subsequent tasks in transportation, urban planning, and geography.

Physics and Society Machine Learning Social and Information Networks

Training Constrained Deconvolutional Networks for Road Scene Semantic Segmentation

225 - German Ros , Simon Stent , Pablo F. Alcantarilla 2016

In this work we investigate the problem of road scene semantic segmentation using Deconvolutional Networks (DNs). Several constraints limit the practical performance of DNs in this context: firstly, the paucity of existing pixel-wise labelled training data, and secondly, the memory constraints of embedded hardware, which rule out the practical use of state-of-the-art DN architectures such as fully convolutional networks (FCN). To address the first constraint, we introduce a Multi-Domain Road Scene Semantic Segmentation (MDRS3) dataset, aggregating data from six existing densely and sparsely labelled datasets for training our models, and two existing, separate datasets for testing their generalisation performance. We show that, while MDRS3 offers a greater volume and variety of data, end-to-end training of a memory efficient DN does not yield satisfactory performance. We propose a new training strategy to overcome this, based on (i) the creation of a best-possible source network (S-Net) from the aggregated data, ignoring time and memory constraints; and (ii) the transfer of knowledge from S-Net to the memory-efficient target network (T-Net). We evaluate different techniques for S-Net creation and T-Net transferral, and demonstrate that training a constrained deconvolutional network in this manner can unlock better performance than existing training approaches. Specifically, we show that a target network can be trained to achieve improved accuracy versus an FCN despite using less than 1% of the memory. We believe that our approach can be useful beyond automotive scenarios where labelled data is similarly scarce or fragmented and where practical constraints exist on the desired model size. We make available our network models and aggregated multi-domain dataset for reproducibility.

Computer Vision and Pattern Recognition

Joint Spatial and Layer Attention for Convolutional Networks

147 - Tony Joseph , Konstantinos G. Derpanis , Faisal Z. Qureshi 2019

In this paper, we propose a novel approach that learns to sequentially attend to different Convolutional Neural Networks (CNN) layers (i.e., ``what feature abstraction to attend to) and different spatial locations of the selected feature map (i.e., ``where) to perform the task at hand. Specifically, at each Recurrent Neural Network (RNN) step, both a CNN layer and localized spatial region within it are selected for further processing. We demonstrate the effectiveness of this approach on two computer vision tasks: (i) image-based six degree of freedom camera pose regression and (ii) indoor scene classification. Empirically, we show that combining the ``what and ``where aspects of attention improves network performance on both tasks. We evaluate our method on standard benchmarks for camera localization (Cambridge, 7-Scenes, and TUM-LSI) and for scene classification (MIT-67 Indoor Scenes). For camera localization our approach reduces the median error by 18.8% for position and 8.2% for orientation (averaged over all scenes), and for scene classification it improves the mean accuracy by 3.4% over previous methods.

Computer Vision and Pattern Recognition

Context Prior for Scene Segmentation

108 - Changqian Yu , Jingbo Wang , Changxin Gao 2020

Recent works have widely explored the contextual dependencies to achieve more accurate segmentation results. However, most approaches rarely distinguish different types of contextual dependencies, which may pollute the scene understanding. In this work, we directly supervise the feature aggregation to distinguish the intra-class and inter-class context clearly. Specifically, we develop a Context Prior with the supervision of the Affinity Loss. Given an input image and corresponding ground truth, Affinity Loss constructs an ideal affinity map to supervise the learning of Context Prior. The learned Context Prior extracts the pixels belonging to the same category, while the reversed prior focuses on the pixels of different classes. Embedded into a conventional deep CNN, the proposed Context Prior Layer can selectively capture the intra-class and inter-class contextual dependencies, leading to robust feature representation. To validate the effectiveness, we design an effective Context Prior Network (CPNet). Extensive quantitative and qualitative evaluations demonstrate that the proposed model performs favorably against state-of-the-art semantic segmentation approaches. More specifically, our algorithm achieves 46.3% mIoU on ADE20K, 53.9% mIoU on PASCAL-Context, and 81.3% mIoU on Cityscapes. Code is available at https://git.io/ContextPrior.

Computer Vision and Pattern Recognition

Spatio-Temporal Graph Convolutional Networks for Road Network Inundation Status Prediction during Urban Flooding

139 - Faxi Yuan , Yuanchang Xu , Qingchun Li 2021

The objective of this study is to predict the near-future flooding status of road segments based on their own and adjacent road segments current status through the use of deep learning framework on fine-grained traffic data. Predictive flood monitoring for situational awareness of road network status plays a critical role to support crisis response activities such as evaluation of the loss of access to hospitals and shelters. Existing studies related to near-future prediction of road network flooding status at road segment level are missing. Using fine-grained traffic speed data related to road sections, this study designed and implemented three spatio-temporal graph convolutional network (STGCN) models to predict road network status during flood events at the road segment level in the context of the 2017 Hurricane Harvey in Harris County (Texas, USA). Model 1 consists of two spatio-temporal blocks considering the adjacency and distance between road segments, while Model 2 contains an additional elevation block to account for elevation difference between road segments. Model 3 includes three blocks for considering the adjacency and the product of distance and elevation difference between road segments. The analysis tested the STGCN models and evaluated their prediction performance. Our results indicated that Model 1 and Model 2 have reliable and accurate performance for predicting road network flooding status in near future (e.g., 2-4 hours) with model precision and recall values larger than 98% and 96%, respectively. With reliable road network status predictions in floods, the proposed model can benefit affected communities to avoid flooded roads and the emergency management agencies to implement evacuation and relief resource delivery plans.

Machine Learning Physics and Society