أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Hanwen Liu

254 - Pablo Navarrete Michelini , Hanwen Liu , Yunhua Lu 2021

We propose a simple extension of residual networks that works simultaneously in multiple resolutions. Our network design is inspired by the iterative back-projection algorithm but seeks the more difficult task of learning how to enhance images. Compa red to similar approaches, we propose a novel solution to make back-projections run in multiple resolutions by using a data pipeline workflow. Features are updated at multiple scales in each layer of the network. The update dynamic through these layers includes interactions between different resolutions in a way that is causal in scale, and it is represented by a system of ODEs, as opposed to a single ODE in the case of ResNets. The system can be used as a generic multi-resolution approach to enhance images. We test it on several challenging tasks with special focus on super-resolution and raindrop removal. Our results are competitive with state-of-the-arts and show a strong ability of our system to learn both global and local image features.

معالجة الصور والفيديو

Multi-Grid Back-Projection Networks

321 - Pablo Navarrete Michelini , Wenbin Chen , Hanwen Liu 2021

Multi-Grid Back-Projection (MGBP) is a fully-convolutional network architecture that can learn to restore images and videos with upscaling artifacts. Using the same strategy of multi-grid partial differential equation (PDE) solvers this multiscale ar chitecture scales computational complexity efficiently with increasing output resolutions. The basic processing block is inspired in the iterative back-projection (IBP) algorithm and constitutes a type of cross-scale residual block with feedback from low resolution references. The architecture performs in par with state-of-the-arts alternatives for regression targets that aim to recover an exact copy of a high resolution image or video from which only a downscale image is known. A perceptual quality target aims to create more realistic outputs by introducing artificial changes that can be different from a high resolution original content as long as they are consistent with the low resolution input. For this target we propose a strategy using noise inputs in different resolution scales to control the amount of artificial details generated in the output. The noise input controls the amount of innovation that the network uses to create artificial realistic details. The effectiveness of this strategy is shown in benchmarks and it is explained as a particular strategy to traverse the perception-distortion plane.

معالجة الصور والفيديو الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Anchor-Based Spatio-Temporal Attention 3D Convolutional Networks for Dynamic 3D Point Cloud Sequences

193 - Guangming Wang , Muyao Chen , Hanwen Liu 2020

With the rapid development of measurement technology, LiDAR and depth cameras are widely used in the perception of the 3D environment. Recent learning based methods for robot perception most focus on the image or video, but deep learning methods for dynamic 3D point cloud sequences are underexplored. Therefore, developing efficient and accurate perception method compatible with these advanced instruments is pivotal to autonomous driving and service robots. An Anchor-based Spatio-Temporal Attention 3D Convolution operation (ASTA3DConv) is proposed in this paper to process dynamic 3D point cloud sequences. The proposed convolution operation builds a regular receptive field around each point by setting several virtual anchors around each point. The features of neighborhood points are firstly aggregated to each anchor based on the spatio-temporal attention mechanism. Then, anchor-based 3D convolution is adopted to aggregate these anchors features to the core points. The proposed method makes better use of the structured information within the local region and learns spatio-temporal embedding features from dynamic 3D point cloud sequences. Anchor-based Spatio-Temporal Attention 3D Convolutional Neural Networks (ASTA3DCNNs) are built for classification and segmentation tasks based on the proposed ASTA3DConv and evaluated on action recognition and semantic segmentation tasks. The experiments and ablation studies on MSRAction3D and Synthia datasets demonstrate the superior performance and effectiveness of our method for dynamic 3D point cloud sequences. Our method achieves the state-of-the-art performance among the methods with dynamic 3D point cloud sequences as input on MSRAction3D and Synthia datasets.

الرؤية الحاسوبية وتمييز الأنماط

Introduction to a novel T2 relaxation analysis method SAME-ECOS: Spectrum Analysis for Multiple Exponentials via Experimental Condition Oriented Simulation

301 - Hanwen Liu , Qing-San Xiang , Roger Tam 2020

We propose a novel T2 relaxation data analysis method which we have named spectrum analysis for multiple exponentials via experimental condition oriented simulation (SAME-ECOS). SAME-ECOS, which was developed based on a combination of information the ory and machine learning neural network algorithms, is tailored for different MR experimental conditions, decomposing multi-exponential decay data into T2 spectra, which had been considered an ill-posed problem using conventional fitting algorithms, including the commonly used non-negative least squares (NNLS) method. Our results demonstrated that, compared with NNLS, the simulation-derived SAME-ECOS model yields much more reliable T2 spectra in a dramatically shorter time, increasing the feasibility of multi-component T2 decay analysis in clinical settings.

الفيزياء الطبية معالجة الإشارات

The purely cosmetic surgery conjecture is true for the Kinoshita-Terasaka and Conway knot families

80 - Bryan Boehnke , Conan Gillis , Hanwen Liu 2020

We show that all nontrivial members of the Kinoshita-Terasaka and Conway knot families satisfy the purely cosmetic surgery conjecture.

الطوبولوجيا الهندسية

MGBPv2: Scaling Up Multi-Grid Back-Projection Networks

101 - Pablo Navarrete Michelini , Wenbin Chen , Hanwen Liu 2019

Here, we describe our solution for the AIM-2019 Extreme Super-Resolution Challenge, where we won the 1st place in terms of perceptual quality (MOS) similar to the ground truth and achieved the 5th place in terms of high-fidelity (PSNR). To tackle thi s challenge, we introduce the second generation of MultiGrid BackProjection networks (MGBPv2) whose major modifications make the system scalable and more general than its predecessor. It combines the scalability of the multigrid algorithm and the performance of iterative backprojections. In its original form, MGBP is limited to a small number of parameters due to a strongly recursive structure. In MGBPv2, we make full use of the multigrid recursion from the beginning of the network; we allow different parameters in every module of the network; we simplify the main modules; and finally, we allow adjustments of the number of network features based on the scale of operation. For inference tasks, we introduce an overlapping patch approach to further allow processing of very large images (e.g. 8K). Our training strategies make use of a multiscale loss, combining distortion and/or perception losses on the output as well as downscaled output images. The final system can balance between high quality and high performance.

معالجة الصور والفيديو الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

A Tour of Convolutional Networks Guided by Linear Interpreters

297 - Pablo Navarrete Michelini , Hanwen Liu , Yunhua Lu 2019

Convolutional networks are large linear systems divided into layers and connected by non-linear units. These units are the articulations that allow the network to adapt to the input. To understand how a network manages to solve a problem we must look at the articulated decisions in entirety. If we could capture the actions of non-linear units for a particular input, we would be able to replay the whole system back and forth as if it was always linear. It would also reveal the actions of non-linearities because the resulting linear system, a Linear Interpreter, depends on the input image. We introduce a hooking layer, called a LinearScope, which allows us to run the network and the linear interpreter in parallel. Its implementation is simple, flexible and efficient. From here we can make many curious inquiries: how do these linear systems look like? When the rows and columns of the transformation matrix are images, how do they look like? What type of basis do these linear transformations rely on? The answers depend on the problems presented, through which we take a tour to some popular architectures used for classification, super-resolution (SR) and image-to-image translation (I2I). For classification we observe that popular networks use a pixel-wise vote per class strategy and heavily rely on bias parameters. For SR and I2I we find that CNNs use wavelet-type basis similar to the human visual system. For I2I we reveal copy-move and template-creation strategies to generate outputs.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي التحليل العددي

Multigrid Backprojection Super-Resolution and Deep Filter Visualization

86 - Pablo Navarrete Michelini , Hanwen Liu , Dan Zhu 2018

We introduce a novel deep-learning architecture for image upscaling by large factors (e.g. 4x, 8x) based on examples of pristine high-resolution images. Our target is to reconstruct high-resolution images from their downsca

معالجة الصور والفيديو الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Convolutional Networks with MuxOut Layers as Multi-rate Systems for Image Upscaling

53 - Pablo Navarrete Michelini , Hanwen Liu 2017

We interpret convolutional networks as adaptive filters and combine them with so-called MuxOut layers to efficiently upscale low resolution images. We formalize this interpretation by deriving a linear and space-variant structure of a convolutional n etwork when its activations are fixed. We introduce general purpose algorithms to analyze a network and show its overall filter effect for each given location. We use this analysis to evaluate two types of image upscalers: deterministic upscalers that target the recovery of details from original content; and second, a new generation of upscalers that can sample the distribution of upscale aliases (images that share the same downscale version) that look like real content.

الرؤية الحاسوبية وتمييز الأنماط

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد