No Arabic abstract
Image restoration tasks demand a complex balance between spatial details and high-level contextualized information while recovering images. In this paper, we propose a novel synergistic design that can optimally balance these competing goals. Our main proposal is a multi-stage architecture, that progressively learns restoration functions for the degraded inputs, thereby breaking down the overall recovery process into more manageable steps. Specifically, our model first learns the contextualized features using encoder-decoder architectures and later combines them with a high-resolution branch that retains local information. At each stage, we introduce a novel per-pixel adaptive design that leverages in-situ supervised attention to reweight the local features. A key ingredient in such a multi-stage architecture is the information exchange between different stages. To this end, we propose a two-faceted approach where the information is not only exchanged sequentially from early to late stages, but lateral connections between feature processing blocks also exist to avoid any loss of information. The resulting tightly interlinked multi-stage architecture, named as MPRNet, delivers strong performance gains on ten datasets across a range of tasks including image deraining, deblurring, and denoising. The source code and pre-trained models are available at https://github.com/swz30/MPRNet.
Visual tracking is typically solved as a discriminative learning problem that usually requires high-quality samples for online model adaptation. It is a critical and challenging problem to evaluate the training samples collected from previous predictions and employ sample selection by their quality to train the model. To tackle the above problem, we propose a joint discriminative learning scheme with the progressive multi-stage optimization policy of sample selection for robust visual tracking. The proposed scheme presents a novel time-weighted and detection-guided self-paced learning strategy for easy-to-hard sample selection, which is capable of tolerating relatively large intra-class variations while maintaining inter-class separability. Such a self-paced learning strategy is jointly optimized in conjunction with the discriminative tracking process, resulting in robust tracking results. Experiments on the benchmark datasets demonstrate the effectiveness of the proposed learning framework.
A moire pattern in the images is resulting from high frequency patterns captured by the image sensor (colour filter array) that appear after demosaicing. These Moire patterns would appear in natural images of scenes with high frequency content. The Moire pattern can also vary intensely due to a minimal change in the camera direction/positioning. Thus the Moire pattern depreciates the quality of photographs. An important issue in demoireing pattern is that the Moireing patterns have dynamic structure with varying colors and forms. These challenges makes the demoireing more difficult than many other image restoration tasks. Inspired by these challenges in demoireing, a multilevel hyper vision net is proposed to remove the Moire pattern to improve the quality of the images. As a key aspect, in this network we involved residual channel attention block that can be used to extract and adaptively fuse hierarchical features from all the layers efficiently. The proposed algorithms has been tested with the NTIRE 2020 challenge dataset and thus achieved 36.85 and 0.98 Peak to Signal Noise Ratio (PSNR) and Structural Similarity (SSIM) Index respectively.
It is a challenging task to restore images from their variants with combined distortions. In the existing works, a promising strategy is to apply parallel operations to handle different types of distortion. However, in the feature fusion phase, a small number of operations would dominate the restoration result due to the features heterogeneity by different operations. To this end, we introduce the tensor 1x1 convolutional layer by imposing high-order tensor (outer) product, by which we not only harmonize the heterogeneous features but also take additional non-linearity into account. To avoid the unacceptable kernel size resulted from the tensor product, we construct the kernels with tensor network decomposition, which is able to convert the exponential growth of the dimension to linear growth. Armed with the new layer, we propose High-order OWAN for multi-distorted image restoration. In the numerical experiments, the proposed net outperforms the previous state-of-the-art and shows promising performance even in more difficult tasks.
The widespread application of audio communication technologies has speeded up audio data flowing across the Internet, which made it a popular carrier for covert communication. In this paper, we present a cross-modal steganography method for hiding image content into audio carriers while preserving the perceptual fidelity of the cover audio. In our framework, two multi-stage networks are designed: the first network encodes the decreasing multilevel residual errors inside different audio subsequences with the corresponding stage sub-networks, while the second network decodes the residual errors from the modified carrier with the corresponding stage sub-networks to produce the final revealed results. The multi-stage design of proposed framework not only make the controlling of payload capacity more flexible, but also make hiding easier because of the gradual sparse characteristic of residual errors. Qualitative experiments suggest that modifications to the carrier are unnoticeable by human listeners and that the decoded images are highly intelligible.
Face restoration is important in face image processing, and has been widely studied in recent years. However, previous works often fail to generate plausible high quality (HQ) results for real-world low quality (LQ) face images. In this paper, we propose a new progressive semantic-aware style transformation framework, named PSFR-GAN, for face restoration. Specifically, instead of using an encoder-decoder framework as previous methods, we formulate the restoration of LQ face images as a multi-scale progressive restoration procedure through semantic-aware style transformation. Given a pair of LQ face image and its corresponding parsing map, we first generate a multi-scale pyramid of the inputs, and then progressively modulate different scale features from coarse-to-fine in a semantic-aware style transfer way. Compared with previous networks, the proposed PSFR-GAN makes full use of the semantic (parsing maps) and pixel (LQ images) space information from different scales of input pairs. In addition, we further introduce a semantic aware style loss which calculates the feature style loss for each semantic region individually to improve the details of face textures. Finally, we pretrain a face parsing network which can generate decent parsing maps from real-world LQ face images. Experiment results show that our model trained with synthetic data can not only produce more realistic high-resolution results for synthetic LQ inputs and but also generalize better to natural LQ face images compared with state-of-the-art methods. Codes are available at https://github.com/chaofengc/PSFRGAN.