No Arabic abstract
Many of our core assumptions about how neural networks operate remain empirically untested. One common assumption is that convolutional neural networks need to be stable to small translations and deformations to solve image recognition tasks. For many years, this stability was baked into CNN architectures by incorporating interleaved pooling layers. Recently, however, interleaved pooling has largely been abandoned. This raises a number of questions: Are our intuitions about deformation stability right at all? Is it important? Is pooling necessary for deformation invariance? If not, how is deformation invariance achieved in its absence? In this work, we rigorously test these questions, and find that deformation stability in convolutional networks is more nuanced than it first appears: (1) Deformation invariance is not a binary property, but rather that different tasks require different degrees of deformation stability at different layers. (2) Deformation stability is not a fixed property of a network and is heavily adjusted over the course of training, largely through the smoothness of the convolutional filters. (3) Interleaved pooling layers are neither necessary nor sufficient for achieving the optimal form of deformation stability for natural image classification. (4) Pooling confers too much deformation stability for image classification at initialization, and during training, networks have to learn to counteract this inductive bias. Together, these findings provide new insights into the role of interleaved pooling and deformation invariance in CNNs, and demonstrate the importance of rigorous empirical testing of even our most basic assumptions about the working of neural networks.
Compared with global average pooling in existing deep convolutional neural networks (CNNs), global covariance pooling can capture richer statistics of deep features, having potential for improving representation and generalization abilities of deep CNNs. However, integration of global covariance pooling into deep CNNs brings two challenges: (1) robust covariance estimation given deep features of high dimension and small sample size; (2) appropriate usage of geometry of covariances. To address these challenges, we propose a global Matrix Power Normalized COVariance (MPN-COV) Pooling. Our MPN-COV conforms to a robust covariance estimator, very suitable for scenario of high dimension and small sample size. It can also be regarded as Power-Euclidean metric between covariances, effectively exploiting their geometry. Furthermore, a global Gaussian embedding network is proposed to incorporate first-order statistics into MPN-COV. For fast training of MPN-COV networks, we implement an iterative matrix square root normalization, avoiding GPU unfriendly eigen-decomposition inherent in MPN-COV. Additionally, progressive 1x1 convolutions and group convolution are introduced to compress covariance representations. The proposed methods are highly modular, readily plugged into existing deep CNNs. Extensive experiments are conducted on large-scale object classification, scene categorization, fine-grained visual recognition and texture classification, showing our methods outperform the counterparts and obtain state-of-the-art performance.
In this paper, we challenge the common assumption that collapsing the spatial dimensions of a 3D (spatial-channel) tensor in a convolutional neural network (CNN) into a vector via global pooling removes all spatial information. Specifically, we demonstrate that positional information is encoded based on the ordering of the channel dimensions, while semantic information is largely not. Following this demonstration, we show the real world impact of these findings by applying them to two applications. First, we propose a simple yet effective data augmentation strategy and loss function which improves the translation invariance of a CNNs output. Second, we propose a method to efficiently determine which channels in the latent representation are responsible for (i) encoding overall position information or (ii) region-specific positions. We first show that semantic segmentation has a significant reliance on the overall position channels to make predictions. We then show for the first time that it is possible to perform a `region-specific attack, and degrade a networks performance in a particular part of the input. We believe our findings and demonstrated applications will benefit research areas concerned with understanding the characteristics of CNNs.
Two long standing open problems in quantum theory are to characterize the class of initial system-bath states for which quantum dynamics is equivalent to (1) a map between the initial and final system states, and (2) a completely positive (CP) map. The CP map problem is especially important, due to the widespread use of such maps in quantum information processing and open quantum systems theory. Here we settle both these questions by showing that the answer to the first is all, with the resulting map being Hermitian, and that the answer to the second is that CP maps arise exclusively from the class of separable states with vanishing quantum discord.
We propose a new model for digital pathology segmentation, based on the observation that histopathology images are inherently symmetric under rotation and reflection. Utilizing recent findings on rotation equivariant CNNs, the proposed model leverages these symmetries in a principled manner. We present a visual analysis showing improved stability on predictions, and demonstrate that exploiting rotation equivariance significantly improves tumor detection performance on a challenging lymph node metastases dataset. We further present a novel derived dataset to enable principled comparison of machine learning models, in combination with an initial benchmark. Through this dataset, the task of histopathology diagnosis becomes accessible as a challenging benchmark for fundamental machine learning research.
Many-Worlds quantum mechanics differs from standard quantum mechanics in that in Many-Worlds, the wave function is a relative density of universes in the multiverse amplitude rather than a probability amplitude. This means that in Many-Worlds, the Born frequencies are approached rather than given a priori. Thus in Many-Worlds the rate of approach to the final frequencies can be calculated and compared with observation. I use Many-Worlds to derive the rate of approach in the double slit experiment, and show that it agrees with observation. Standard quantum theory has never been used to derive an approach formula because it cannot be so used, as has been tacitly acknowledged for 70 years.