No Arabic abstract
The convolution operation is a powerful tool for feature extraction and plays a prominent role in the field of computer vision. However, when targeting the pixel-wise tasks like image fusion, it would not fully perceive the particularity of each pixel in the image if the uniform convolution kernel is used on different patches. In this paper, we propose a local adaptive convolution (LAConv), which is dynamically adjusted to different spatial locations. LAConv enables the network to pay attention to every specific local area in the learning process. Besides, the dynamic bias (DYB) is introduced to provide more possibilities for the depiction of features and make the network more flexible. We further design a residual structure network equipped with the proposed LAConv and DYB modules, and apply it to two image fusion tasks. Experiments for pansharpening and hyperspectral image super-resolution (HISR) demonstrate the superiority of our method over other state-of-the-art methods. It is worth mentioning that LAConv can also be competent for other super-resolution tasks with less computation effort.
The field of neural image compression has witnessed exciting progress as recently proposed architectures already surpass the established transform coding based approaches. While, so far, research has mainly focused on architecture and model improvements, in this work we explore content adaptive optimization. To this end, we introduce an iterative procedure which adapts the latent representation to the specific content we wish to compress while keeping the parameters of the network and the predictive model fixed. Our experiments show that this allows for an overall increase in rate-distortion performance, independently of the specific architecture used. Furthermore, we also evaluate this strategy in the context of adapting a pretrained network to other content that is different in visual appearance or resolution. Here, our experiments show that our adaptation strategy can largely close the gap as compared to models specifically trained for the given content while having the benefit that no additional data in the form of model parameter updates has to be transmitted.
Multi-focus image fusion (MFIF) has attracted considerable interests due to its numerous applications. While much progress has been made in recent years with efforts on developing various MFIF algorithms, some issues significantly hinder the fair and comprehensive performance comparison of MFIF methods, such as the lack of large-scale test set and the random choices of objective evaluation metrics in the literature. To solve these issues, this paper presents a multi-focus image fusion benchmark (MFIFB) which consists a test set of 105 image pairs, a code library of 30 MFIF algorithms, and 20 evaluation metrics. MFIFB is the first benchmark in the field of MFIF and provides the community a platform to compare MFIF algorithms fairly and comprehensively. Extensive experiments have been conducted using the proposed MFIFB to understand the performance of these algorithms. By analyzing the experimental results, effective MFIF algorithms are identified. More importantly, some observations on the status of the MFIF field are given, which can help to understand this field better.
Image decomposition is a crucial subject in the field of image processing. It can extract salient features from the source image. We propose a new image decomposition method based on convolutional neural network. This method can be applied to many image processing tasks. In this paper, we apply the image decomposition network to the image fusion task. We input infrared image and visible light image and decompose them into three high-frequency feature images and a low-frequency feature image respectively. The two sets of feature images are fused using a specific fusion strategy to obtain fusion feature images. Finally, the feature images are reconstructed to obtain the fused image. Compared with the state-of-the-art fusion methods, this method has achieved better performance in both subjective and objective evaluation.
Disentangling content and style information of an image has played an important role in recent success in image translation. In this setting, how to inject given style into an input image containing its own content is an important issue, but existing methods followed relatively simple approaches, leaving room for improvement especially when incorporating significant style changes. In response, we propose an advanced normalization technique based on adaptive convolution (AdaCoN), in order to properly impose style information into the content of an input image. In detail, after locally standardizing the content representation in a channel-wise manner, AdaCoN performs adaptive convolution where the convolution filter weights are dynamically estimated using the encoded style representation. The flexibility of AdaCoN can handle complicated image translation tasks involving significant style changes. Our qualitative and quantitative experiments demonstrate the superiority of our proposed method against various existing approaches that inject the style into the content.
Robust road segmentation is a key challenge in self-driving research. Though many image-based methods have been studied and high performances in dataset evaluations have been reported, developing robust and reliable road segmentation is still a major challenge. Data fusion across different sensors to improve the performance of road segmentation is widely considered an important and irreplaceable solution. In this paper, we propose a novel structure to fuse image and LiDAR point cloud in an end-to-end semantic segmentation network, in which the fusion is performed at decoder stage instead of at, more commonly, encoder stage. During fusion, we improve the multi-scale LiDAR map generation to increase the precision of the multi-scale LiDAR map by introducing pyramid projection method. Additionally, we adapted the multi-path refinement network with our fusion strategy and improve the road prediction compared with transpose convolution with skip layers. Our approach has been tested on KITTI ROAD dataset and has competitive performance.