MGBPv2: Scaling Up Multi-Grid Back-Projection Networks

102 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Pablo Navarrete Michelini

تاريخ النشر 2019

مجال البحث هندسة إلكترونية الهندسة المعلوماتية

والبحث باللغة English

تأليف Pablo Navarrete Michelini - Wenbin Chen - Hanwen Liu

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Here, we describe our solution for the AIM-2019 Extreme Super-Resolution Challenge, where we won the 1st place in terms of perceptual quality (MOS) similar to the ground truth and achieved the 5th place in terms of high-fidelity (PSNR). To tackle this challenge, we introduce the second generation of MultiGrid BackProjection networks (MGBPv2) whose major modifications make the system scalable and more general than its predecessor. It combines the scalability of the multigrid algorithm and the performance of iterative backprojections. In its original form, MGBP is limited to a small number of parameters due to a strongly recursive structure. In MGBPv2, we make full use of the multigrid recursion from the beginning of the network; we allow different parameters in every module of the network; we simplify the main modules; and finally, we allow adjustments of the number of network features based on the scale of operation. For inference tasks, we introduce an overlapping patch approach to further allow processing of very large images (e.g. 8K). Our training strategies make use of a multiscale loss, combining distortion and/or perception losses on the output as well as downscaled output images. The final system can balance between high quality and high performance.

قيم البحث

اقرأ أيضاً

Multi-Grid Back-Projection Networks

321 - Pablo Navarrete Michelini , Wenbin Chen , Hanwen Liu 2021

Multi-Grid Back-Projection (MGBP) is a fully-convolutional network architecture that can learn to restore images and videos with upscaling artifacts. Using the same strategy of multi-grid partial differential equation (PDE) solvers this multiscale ar chitecture scales computational complexity efficiently with increasing output resolutions. The basic processing block is inspired in the iterative back-projection (IBP) algorithm and constitutes a type of cross-scale residual block with feedback from low resolution references. The architecture performs in par with state-of-the-arts alternatives for regression targets that aim to recover an exact copy of a high resolution image or video from which only a downscale image is known. A perceptual quality target aims to create more realistic outputs by introducing artificial changes that can be different from a high resolution original content as long as they are consistent with the low resolution input. For this target we propose a strategy using noise inputs in different resolution scales to control the amount of artificial details generated in the output. The noise input controls the amount of innovation that the network uses to create artificial realistic details. The effectiveness of this strategy is shown in benchmarks and it is explained as a particular strategy to traverse the perception-distortion plane.

معالجة الصور والفيديو الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Back-Projection Pipeline

254 - Pablo Navarrete Michelini , Hanwen Liu , Yunhua Lu 2021

We propose a simple extension of residual networks that works simultaneously in multiple resolutions. Our network design is inspired by the iterative back-projection algorithm but seeks the more difficult task of learning how to enhance images. Compa red to similar approaches, we propose a novel solution to make back-projections run in multiple resolutions by using a data pipeline workflow. Features are updated at multiple scales in each layer of the network. The update dynamic through these layers includes interactions between different resolutions in a way that is causal in scale, and it is represented by a system of ODEs, as opposed to a single ODE in the case of ResNets. The system can be used as a generic multi-resolution approach to enhance images. We test it on several challenging tasks with special focus on super-resolution and raindrop removal. Our results are competitive with state-of-the-arts and show a strong ability of our system to learn both global and local image features.

معالجة الصور والفيديو

TeCNO: Surgical Phase Recognition with Multi-Stage Temporal Convolutional Networks

78 - Tobias Czempiel , Magdalini Paschali , Matthias Keicher 2020

Automatic surgical phase recognition is a challenging and crucial task with the potential to improve patient safety and become an integral part of intra-operative decision-support systems. In this paper, we propose, for the first time in workflow ana lysis, a Multi-Stage Temporal Convolutional Network (MS-TCN) that performs hierarchical prediction refinement for surgical phase recognition. Causal, dilated convolutions allow for a large receptive field and online inference with smooth predictions even during ambiguous transitions. Our method is thoroughly evaluated on two datasets of laparoscopic cholecystectomy videos with and without the use of additional surgical tool information. Outperforming various state-of-the-art LSTM approaches, we verify the suitability of the proposed causal MS-TCN for surgical phase recognition.

معالجة الصور والفيديو الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Automated Thalamic Nuclei Segmentation Using Multi-Planar Cascaded Convolutional Neural Networks

96 - Mohammad S Majdi 2019

A cascaded multi-planar scheme with a modified residual U-Net architecture was used to segment thalamic nuclei on conventional and white-matter-nulled (WMn) magnetization prepared rapid gradient echo (MPRAGE) data. A single network was optimized to w ork with images from healthy controls and patients with multiple sclerosis (MS) and essential tremor (ET), acquired at both 3T and 7T field strengths. Dice similarity coefficient and volume similarity index (VSI) were used to evaluate performance. Clinical utility was demonstrated by applying this method to study the effect of MS on thalamic nuclei atrophy. Segmentation of each thalamus into twelve nuclei was achieved in under a minute. For 7T WMn-MPRAGE, the proposed method outperforms current state-of-the-art on patients with ET with statistically significant improvements in Dice for five nuclei (increase in the range of 0.05-0.18) and VSI for four nuclei (increase in the range of 0.05-0.19), while performing comparably for healthy and MS subjects. Dice and VSI achieved using 7T WMn-MPRAGE data are comparable to those using 3T WMn-MPRAGE data. For conventional MPRAGE, the proposed method shows a statistically significant Dice improvement in the range of 0.14-0.63 over FreeSurfer for all nuclei and disease types. Effect of noise on network performance shows robustness to images with SNR as low as half the baseline SNR. Atrophy of four thalamic nuclei and whole thalamus was observed for MS patients compared to healthy control subjects, after controlling for the effect of parallel imaging, intracranial volume, gender, and age (p<0.004). The proposed segmentation method is fast, accurate, performs well across disease types and field strengths, and shows great potential for improving our understanding of thalamic nuclei involvement in neurological diseases.

معالجة الصور والفيديو الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Transfer Learning Enhanced Generative Adversarial Networks for Multi-Channel MRI Reconstruction

125 - Jun Lv , Guangyuan Li , Xiangrong Tong 2021

Deep learning based generative adversarial networks (GAN) can effectively perform image reconstruction with under-sampled MR data. In general, a large number of training samples are required to improve the reconstruction performance of a certain mode l. However, in real clinical applications, it is difficult to obtain tens of thousands of raw patient data to train the model since saving k-space data is not in the routine clinical flow. Therefore, enhancing the generalizability of a network based on small samples is urgently needed. In this study, three novel applications were explored based on parallel imaging combined with the GAN model (PI-GAN) and transfer learning. The model was pre-trained with public Calgary brain images and then fine-tuned for use in (1) patients with tumors in our center; (2) different anatomies, including knee and liver; (3) different k-space sampling masks with acceleration factors (AFs) of 2 and 6. As for the brain tumor dataset, the transfer learning results could remove the artifacts found in PI-GAN and yield smoother brain edges. The transfer learning results for the knee and liver were superior to those of the PI-GAN model trained with its own dataset using a smaller number of training cases. However, the learning procedure converged more slowly in the knee datasets compared to the learning in the brain tumor datasets. The reconstruction performance was improved by transfer learning both in the models with AFs of 2 and 6. Of these two models, the one with AF=2 showed better results. The results also showed that transfer learning with the pre-trained model could solve the problem of inconsistency between the training and test datasets and facilitate generalization to unseen data.

معالجة الصور والفيديو الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي