No Arabic abstract
Demosaicking and denoising are among the most crucial steps of modern digital camera pipelines and their joint treatment is a highly ill-posed inverse problem where at-least two-thirds of the information are missing and the rest are corrupted by noise. This poses a great challenge in obtaining meaningful reconstructions and a special care for the efficient treatment of the problem is required. While there are several machine learning approaches that have been recently introduced to deal with joint image demosaicking-denoising, in this work we propose a novel deep learning architecture which is inspired by powerful classical image regularization methods and large-scale convex optimization techniques. Consequently, our derived network is more transparent and has a clear interpretation compared to alternative competitive deep learning approaches. Our extensive experiments demonstrate that our network outperforms any previous approaches on both noisy and noise-free data. This improvement in reconstruction quality is attributed to the principled way we design our network architecture, which also requires fewer trainable parameters than the current state-of-the-art deep network solution. Finally, we show that our network has the ability to generalize well even when it is trained on small datasets, while keeping the overall number of trainable parameters low.
Modern digital cameras rely on the sequential execution of separate image processing steps to produce realistic images. The first two steps are usually related to denoising and demosaicking where the former aims to reduce noise from the sensor and the latter converts a series of light intensity readings to color images. Modern approaches try to jointly solve these problems, i.e. joint denoising-demosaicking which is an inherently ill-posed problem given that two-thirds of the intensity information is missing and the rest are perturbed by noise. While there are several machine learning systems that have been recently introduced to solve this problem, the majority of them relies on generic network architectures which do not explicitly take into account the physical image model. In this work we propose a novel algorithm which is inspired by powerful classical image regularization methods, large-scale optimization, and deep learning techniques. Consequently, our derived iterative optimization algorithm, which involves a trainable denoising network, has a transparent and clear interpretation compared to other black-box data driven approaches. Our extensive experimentation line demonstrates that our proposed method outperforms any previous approaches for both noisy and noise-free data across many different datasets. This improvement in reconstruction quality is attributed to the rigorous derivation of an iterative solution and the principled way we design our denoising network architecture, which as a result requires fewer trainable parameters than the current state-of-the-art solution and furthermore can be efficiently trained by using a significantly smaller number of training data than existing deep demosaicking networks. Code and results can be found at https://github.com/cig-skoltech/deep_demosaick
During the image acquisition process, noise is usually added to the data mainly due to physical limitations of the acquisition sensor, and also regarding imprecisions during the data transmission and manipulation. In that sense, the resultant image needs to be processed to attenuate its noise without losing details. Non-learning-based strategies such as filter-based and noise prior modeling have been adopted to solve the image denoising problem. Nowadays, learning-based denoising techniques showed to be much more effective and flexible approaches, such as Residual Convolutional Neural Networks. Here, we propose a new learning-based non-blind denoising technique named Attention Residual Convolutional Neural Network (ARCNN), and its extension to blind denoising named Flexible Attention Residual Convolutional Neural Network (FARCNN). The proposed methods try to learn the underlying noise expectation using an Attention-Residual mechanism. Experiments on public datasets corrupted by different levels of Gaussian and Poisson noise support the effectiveness of the proposed approaches against some state-of-the-art image denoising methods. ARCNN achieved an overall average PSNR results of around 0.44dB and 0.96dB for Gaussian and Poisson denoising, respectively FARCNN presented very consistent results, even with slightly worsen performance compared to ARCNN.
The breakthrough of contrastive learning (CL) has fueled the recent success of self-supervised learning (SSL) in high-level vision tasks on RGB images. However, CL is still ill-defined for low-level vision tasks, such as joint demosaicking and denoising (JDD), in the RAW domain. To bridge this methodological gap, we present a novel CL approach on RAW images, residual contrastive learning (RCL), which aims to learn meaningful representations for JDD. Our work is built on the assumption that noise contained in each RAW image is signal-dependent, thus two crops from the same RAW image should have more similar noise distribution than two crops from different RAW images. We use residuals as a discriminative feature and the earth movers distance to measure the distribution divergence for the contrastive loss. To evaluate the proposed CL strategy, we simulate a series of unsupervised JDD experiments with large-scale data corrupted by synthetic signal-dependent noise, where we set a new benchmark for unsupervised JDD tasks with unknown (random) noise variance. Our empirical study not only validates that CL can be applied on distributions (c.f. features), but also exposes the lack of robustness of previous non-ML and SSL JDD methods when the statistics of the noise are unknown, thus providing some further insight into signal-dependent noise problems.
The acquisition of Magnetic Resonance Imaging (MRI) is inherently slow. Inspired by recent advances in deep learning, we propose a framework for reconstructing MR images from undersampled data using a deep cascade of convolutional neural networks to accelerate the data acquisition process. We show that for Cartesian undersampling of 2D cardiac MR images, the proposed method outperforms the state-of-the-art compressed sensing approaches, such as dictionary learning-based MRI (DLMRI) reconstruction, in terms of reconstruction error, perceptual quality and reconstruction speed for both 3-fold and 6-fold undersampling. Compared to DLMRI, the error produced by the method proposed is approximately twice as small, allowing to preserve anatomical structures more faithfully. Using our method, each image can be reconstructed in 23 ms, which is fast enough to enable real-time applications.
Inspired by recent advances in deep learning, we propose a framework for reconstructing dynamic sequences of 2D cardiac magnetic resonance (MR) images from undersampled data using a deep cascade of convolutional neural networks (CNNs) to accelerate the data acquisition process. In particular, we address the case where data is acquired using aggressive Cartesian undersampling. Firstly, we show that when each 2D image frame is reconstructed independently, the proposed method outperforms state-of-the-art 2D compressed sensing approaches such as dictionary learning-based MR image reconstruction, in terms of reconstruction error and reconstruction speed. Secondly, when reconstructing the frames of the sequences jointly, we demonstrate that CNNs can learn spatio-temporal correlations efficiently by combining convolution and data sharing approaches. We show that the proposed method consistently outperforms state-of-the-art methods and is capable of preserving anatomical structure more faithfully up to 11-fold undersampling. Moreover, reconstruction is very fast: each complete dynamic sequence can be reconstructed in less than 10s and, for the 2D case, each image frame can be reconstructed in 23ms, enabling real-time applications.