Shift Equivariance for Pixel-based Self-supervised SAR-optical Feature Fusion

62 0 0.0 ( 0 )

Download Cite

Added by Yuxing Chen

Publication date 2021

fields Electronic Engineering

and research's language is English

Authors Yuxing Chen - Lorenzo Bruzzone

Image and Video Processing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The effective combination of the complementary information provided by the huge amount of unlabeled multi-sensor data (e.g., Synthetic Aperture Radar (SAR), optical images) is a critical topic in remote sensing. Recently, contrastive learning methods have reached remarkable success in obtaining meaningful feature representations from multi-view data. However, these methods only focus on the image-level features, which may not satisfy the requirement for dense prediction tasks such as the land-cover mapping. In this work, we propose a new self-supervised approach to SAR-optical data fusion that can learn disentangled pixel-wise feature representations directly by taking advantage of both multi-view contrastive loss and the bootstrap your own latent (BYOL) methods. Two key contributions of the proposed approach are a multi-view contrastive loss to encode the multimodal images and a shift operation to reconstruct learned representations for each pixel by building the local consistency between different augmented views. In the experimental period, we first verified the effectiveness of multi-view contrastive loss and BYOL in self-supervised learning on SAR-optical fusion using an image-level classification task. Then we validated the proposed approach on a land-cover mapping task by training it with unlabeled SAR-optical image pairs. There we used labeled data pairs to evaluate the discriminative capability of learned features in downstream tasks. Results show that the proposed approach extracts features that result in higher accuracy and that reduces the dimension of representations with respect to the image-level contrastive learning method.

rate research

The QXS-SAROPT Dataset for Deep Learning in SAR-Optical Data Fusion

83 - Meiyu Huang , Yao Xu , Lixin Qian 2021

Deep learning techniques have made an increasing impact on the field of remote sensing. However, deep neural networks based fusion of multimodal data from different remote sensors with heterogenous characteristics has not been fully explored, due to the lack of availability of big amounts of perfectly aligned multi-sensor image data with diverse scenes of high resolutions, especially for synthetic aperture radar (SAR) data and optical imagery. To promote the development of deep learning based SAR-optical fusion approaches, we release the QXS-SAROPT dataset, which contains 20,000 pairs of SAR-optical image patches. We obtain the SAR patches from SAR satellite GaoFen-3 images and the optical patches from Google Earth images. These images cover three port cities: San Diego, Shanghai and Qingdao. Here, we present a detailed introduction of the construction of the dataset, and show its two representative exemplary applications, namely SAR-optical image matching and SAR ship detection boosted by cross-modal information from optical images. As a large open SAR-optical dataset with multiple scenes of a high resolution, we believe QXS-SAROPT will be of potential value for further research in SAR-optical data fusion technology based on deep learning.

Image and Video Processing Computer Vision and Pattern Recognition Machine Learning

Self-Supervised Feature Extraction for 3D Axon Segmentation

111 - Tzofi Klinghoffer , Peter Morales , Young-Gyun Park 2020

Existing learning-based methods to automatically trace axons in 3D brain imagery often rely on manually annotated segmentation labels. Labeling is a labor-intensive process and is not scalable to whole-brain analysis, which is needed for improved understanding of brain function. We propose a self-supervised auxiliary task that utilizes the tube-like structure of axons to build a feature extractor from unlabeled data. The proposed auxiliary task constrains a 3D convolutional neural network (CNN) to predict the order of permuted slices in an input 3D volume. By solving this task, the 3D CNN is able to learn features without ground-truth labels that are useful for downstream segmentation with the 3D U-Net model. To the best of our knowledge, our model is the first to perform automated segmentation of axons imaged at subcellular resolution with the SHIELD technique. We demonstrate improved segmentation performance over the 3D U-Net model on both the SHIELD PVGPe dataset and the BigNeuron Project, single neuron Janelia dataset.

Image and Video Processing Computer Vision and Pattern Recognition

Fully Polarimetric SAR and Single-Polarization SAR Image Fusion Network

176 - Liupeng Lin , Jie Li , Huanfeng Shen 2021

The data fusion technology aims to aggregate the characteristics of different data and obtain products with multiple data advantages. To solves the problem of reduced resolution of PolSAR images due to system limitations, we propose a fully polarimetric synthetic aperture radar (PolSAR) images and single-polarization synthetic aperture radar SAR (SinSAR) images fusion network to generate high-resolution PolSAR (HR-PolSAR) images. To take advantage of the polarimetric information of the low-resolution PolSAR (LR-PolSAR) image and the spatial information of the high-resolution single-polarization SAR (HR-SinSAR) image, we propose a fusion framework for joint LR-PolSAR image and HR-SinSAR image and design a cross-attention mechanism to extract features from the joint input data. Besides, based on the physical imaging mechanism, we designed the PolSAR polarimetric loss function for constrained network training. The experimental results confirm the superiority of fusion network over traditional algorithms. The average PSNR is increased by more than 3.6db, and the average MAE is reduced to less than 0.07. Experiments on polarimetric decomposition and polarimetric signature show that it maintains polarimetric information well.

Image and Video Processing Computer Vision and Pattern Recognition

Cross Pixel Optical Flow Similarity for Self-Supervised Learning

145 - Aravindh Mahendran , James Thewlis , Andrea Vedaldi 2018

We propose a novel method for learning convolutional neural image representations without manual supervision. We use motion cues in the form of optical flow, to supervise representations of static images. The obvious approach of training a network to predict flow from a single image can be needlessly difficult due to intrinsic ambiguities in this prediction task. We instead propose a much simpler learning goal: embed pixels such that the similarity between their embeddings matches that between their optical flow vectors. At test time, the learned deep network can be used without access to video or flow information and transferred to tasks such as image classification, detection, and segmentation. Our method, which significantly simplifies previous attempts at using motion for self-supervision, achieves state-of-the-art results in self-supervision using motion cues, competitive results for self-supervision in general, and is overall state of the art in self-supervised pretraining for semantic image segmentation, as demonstrated on standard benchmarks.

Computer Vision and Pattern Recognition Machine Learning Neural and Evolutionary Computing

NeighCNN: A CNN based SAR Speckle Reduction using Feature preserving Loss Function

80 - Praveen Ravirathinam , Darshan Agrawal , J. Jennifer Ranjani 2021

Coherent imaging systems like synthetic aperture radar are susceptible to multiplicative noise that makes applications like automatic target recognition challenging. In this paper, NeighCNN, a deep learning-based speckle reduction algorithm that handles multiplicative noise with relatively simple convolutional neural network architecture, is proposed. We have designed a loss function which is an unique combination of weighted sum of Euclidean, neighbourhood, and perceptual loss for training the deep network. Euclidean and neighbourhood losses take pixel-level information into account, whereas perceptual loss considers high-level semantic features between two images. Various synthetic, as well as real SAR images, are used for testing the NeighCNN architecture, and the results verify the noise removal and edge preservation abilities of the proposed architecture. Performance metrics like peak-signal-to-noise ratio, structural similarity index, and universal image quality index are used for evaluating the efficiency of the proposed architecture on synthetic images.

Image and Video Processing Computer Vision and Pattern Recognition