No Arabic abstract
In this work, we investigate various methods to deal with semantic labeling of very high resolution multi-modal remote sensing data. Especially, we study how deep fully convolutional networks can be adapted to deal with multi-modal and multi-scale remote sensing data for semantic labeling. Our contributions are threefold: a) we present an efficient multi-scale approach to leverage both a large spatial context and the high resolution data, b) we investigate early and late fusion of Lidar and multispectral data, c) we validate our methods on two public datasets with state-of-the-art results. Our results indicate that late fusion make it possible to recover errors steaming from ambiguous data, while early fusion allows for better joint-feature learning but at the cost of higher sensitivity to missing data.
To reduce the storage requirements, remote sensing (RS) images are usually stored in compressed format. Existing scene classification approaches using deep neural networks (DNNs) require to fully decompress the images, which is a computationally demanding task in operational applications. To address this issue, in this paper we propose a novel approach to achieve scene classification in JPEG 2000 compressed RS images. The proposed approach consists of two main steps: i) approximation of the finer resolution sub-bands of reversible biorthogonal wavelet filters used in JPEG 2000; and ii) characterization of the high-level semantic content of approximated wavelet sub-bands and scene classification based on the learnt descriptors. This is achieved by taking codestreams associated with the coarsest resolution wavelet sub-band as input to approximate finer resolution sub-bands using a number of transposed convolutional layers. Then, a series of convolutional layers models the high-level semantic content of the approximated wavelet sub-band. Thus, the proposed approach models the multiresolution paradigm given in the JPEG 2000 compression algorithm in an end-to-end trainable unified neural network. In the classification stage, the proposed approach takes only the coarsest resolution wavelet sub-bands as input, thereby reducing the time required to apply decoding. Experimental results performed on two benchmark aerial image archives demonstrate that the proposed approach significantly reduces the computational time with similar classification accuracies when compared to traditional RS scene classification approaches (which requires full image decompression).
Single image super-resolution is an effective way to enhance the spatial resolution of remote sensing image, which is crucial for many applications such as target detection and image classification. However, existing methods based on the neural network usually have small receptive fields and ignore the image detail. We propose a novel method named deep memory connected network (DMCN) based on a convolutional neural network to reconstruct high-quality super-resolution images. We build local and global memory connections to combine image detail with environmental information. To further reduce parameters and ease time-consuming, we propose downsampling units, shrinking the spatial size of feature maps. We test DMCN on three remote sensing datasets with different spatial resolution. Experimental results indicate that our method yields promising improvements in both accuracy and visual performance over the current state-of-the-art.
Semantic segmentation is an essential part of deep learning. In recent years, with the development of remote sensing big data, semantic segmentation has been increasingly used in remote sensing. Deep convolutional neural networks (DCNNs) face the challenge of feature fusion: very-high-resolution remote sensing image multisource data fusion can increase the networks learnable information, which is conducive to correctly classifying target objects by DCNNs; simultaneously, the fusion of high-level abstract features and low-level spatial features can improve the classification accuracy at the border between target objects. In this paper, we propose a multipath encoder structure to extract features of multipath inputs, a multipath attention-fused block module to fuse multipath features, and a refinement attention-fused block module to fuse high-level abstract features and low-level spatial features. Furthermore, we propose a novel convolutional neural network architecture, named attention-fused network (AFNet). Based on our AFNet, we achieve state-of-the-art performance with an overall accuracy of 91.7% and a mean F1 score of 90.96% on the ISPRS Vaihingen 2D dataset and an overall accuracy of 92.1% and a mean F1 score of 93.44% on the ISPRS Potsdam 2D dataset.
Classification and identification of the materials lying over or beneath the Earths surface have long been a fundamental but challenging research topic in geoscience and remote sensing (RS) and have garnered a growing concern owing to the recent advancements of deep learning techniques. Although deep networks have been successfully applied in single-modality-dominated classification tasks, yet their performance inevitably meets the bottleneck in complex scenes that need to be finely classified, due to the limitation of information diversity. In this work, we provide a baseline solution to the aforementioned difficulty by developing a general multimodal deep learning (MDL) framework. In particular, we also investigate a special case of multi-modality learning (MML) -- cross-modality learning (CML) that exists widely in RS image classification applications. By focusing on what, where, and how to fuse, we show different fusion strategies as well as how to train deep networks and build the network architecture. Specifically, five fusion architectures are introduced and developed, further being unified in our MDL framework. More significantly, our framework is not only limited to pixel-wise classification tasks but also applicable to spatial information modeling with convolutional neural networks (CNNs). To validate the effectiveness and superiority of the MDL framework, extensive experiments related to the settings of MML and CML are conducted on two different multimodal RS datasets. Furthermore, the codes and datasets will be available at https://github.com/danfenghong/IEEE_TGRS_MDL-RS, contributing to the RS community.
In this work, we present a novel module to perform fusion of heterogeneous data using fully convolutional networks for semantic labeling. We introduce residual correction as a way to learn how to fuse predictions coming out of a dual stream architecture. Especially, we perform fusion of DSM and IRRG optical data on the ISPRS Vaihingen dataset over a urban area and obtain new state-of-the-art results.