A Tour of Convolutional Networks Guided by Linear Interpreters

298 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Pablo Navarrete Michelini

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Pablo Navarrete Michelini - Hanwen Liu - Yunhua Lu

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Convolutional networks are large linear systems divided into layers and connected by non-linear units. These units are the articulations that allow the network to adapt to the input. To understand how a network manages to solve a problem we must look at the articulated decisions in entirety. If we could capture the actions of non-linear units for a particular input, we would be able to replay the whole system back and forth as if it was always linear. It would also reveal the actions of non-linearities because the resulting linear system, a Linear Interpreter, depends on the input image. We introduce a hooking layer, called a LinearScope, which allows us to run the network and the linear interpreter in parallel. Its implementation is simple, flexible and efficient. From here we can make many curious inquiries: how do these linear systems look like? When the rows and columns of the transformation matrix are images, how do they look like? What type of basis do these linear transformations rely on? The answers depend on the problems presented, through which we take a tour to some popular architectures used for classification, super-resolution (SR) and image-to-image translation (I2I). For classification we observe that popular networks use a pixel-wise vote per class strategy and heavily rely on bias parameters. For SR and I2I we find that CNNs use wavelet-type basis similar to the human visual system. For I2I we reveal copy-move and template-creation strategies to generate outputs.

قيم البحث

اقرأ أيضاً

Dilation theory: a guided tour

210 - Orr Shalit 2020

Dilation theory is a paradigm for studying operators by way of exhibiting an operator as a compression of another operator which is in some sense well behaved. For example, every contraction can be dilated to (i.e., is a compression of) a unitary ope rator, and on this simple fact a penetrating theory of non-normal operators has been developed. In the first part of this survey, I will leisurely review key classical results on dilation theory for a single operator or for several commuting operators, and sample applications of dilation theory in operator theory and in function theory. Then, in the second part, I will give a rapid account of a plethora of variants of dilation theory and their applications. In particular, I will discuss dilation theory of completely positive maps and semigroups, as well as the operator algebraic approach to dilation theory. In the last part, I will present relatively new dilation problems in the noncommutative setting which are related to the study of matrix convex sets and operator systems, and are motivated by applications in control theory. These problems include dilating tuples of noncommuting operators to tuples of commuting normal operators with a specified joint spectrum. I will also describe the recently studied problem of determining the optimal constant $c = c_{theta,theta}$, such that every pair of unitaries $U,V$ satisfying $VU = e^{itheta} UV$ can be dilated to a pair of $cU, cV$, where $U,V$ are unitaries that satisfy the commutation relation $VU = e^{itheta} UV$. The solution of this problem gives rise to a new and surprising application of dilation theory to the continuity of the spectrum of the almost Mathieu operator from mathematical physics.

عامل الجبر تحليل وظيفي

ExpandNets: Linear Over-parameterization to Train Compact Convolutional Networks

85 - Shuxuan Guo , Jose M. Alvarez , Mathieu Salzmann 2018

We introduce an approach to training a given compact network. To this end, we leverage over-parameterization, which typically improves both neural network optimization and generalization. Specifically, we propose to expand each linear layer of the co mpact network into multiple consecutive linear layers, without adding any nonlinearity. As such, the resulting expanded network, or ExpandNet, can be contracted back to the compact one algebraically at inference. In particular, we introduce two convolutional expansion strategies and demonstrate their benefits on several tasks, including image classification, object detection, and semantic segmentation. As evidenced by our experiments, our approach outperforms both training the compact network from scratch and performing knowledge distillation from a teacher. Furthermore, our linear over-parameterization empirically reduces gradient confusion during training and improves the network generalization.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

AVGCN: Trajectory Prediction using Graph Convolutional Networks Guided by Human Attention

373 - Congcong Liu , Yuying Chen , Ming Liu 2021

Pedestrian trajectory prediction is a critical yet challenging task, especially for crowded scenes. We suggest that introducing an attention mechanism to infer the importance of different neighbors is critical for accurate trajectory prediction in sc enes with varying crowd size. In this work, we propose a novel method, AVGCN, for trajectory prediction utilizing graph convolutional networks (GCN) based on human attention (A denotes attention, V denotes visual field constraints). First, we train an attention network that estimates the importance of neighboring pedestrians, using gaze data collected as subjects perform a birds eye view crowd navigation task. Then, we incorporate the learned attention weights modulated by constraints on the pedestrians visual field into a trajectory prediction network that uses a GCN to aggregate information from neighbors efficiently. AVGCN also considers the stochastic nature of pedestrian trajectories by taking advantage of variational trajectory prediction. Our approach achieves state-of-the-art performance on several trajectory prediction benchmarks, and the lowest average prediction error over all considered benchmarks.

الرؤية الحاسوبية وتمييز الأنماط علم الروبوتات

Linear and Deformable Image Registration with 3D Convolutional Neural Networks

171 - Stergios Christodoulidis , Mihir Sahasrabudhe , Maria Vakalopoulou 2018

Image registration and in particular deformable registration methods are pillars of medical imaging. Inspired by the recent advances in deep learning, we propose in this paper, a novel convolutional neural network architecture that couples linear and deformable registration within a unified architecture endowed with near real-time performance. Our framework is modular with respect to the global transformation component, as well as with respect to the similarity function while it guarantees smooth displacement fields. We evaluate the performance of our network on the challenging problem of MRI lung registration, and demonstrate superior performance with respect to state of the art elastic registration methods. The proposed deformation (between inspiration & expiration) was considered within a clinically relevant task of interstitial lung disease (ILD) classification and showed promising results.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

A Guided Tour to Normalized Volume