ترغب بنشر مسار تعليمي؟ اضغط هنا

Homogeneous Linear Inequality Constraints for Neural Network Activations

328   0   0.0 ( 0 )
 نشر من قبل Thomas Frerix
 تاريخ النشر 2019
والبحث باللغة English




اسأل ChatGPT حول البحث

We propose a method to impose homogeneous linear inequality constraints of the form $Axleq 0$ on neural network activations. The proposed method allows a data-driven training approach to be combined with modeling prior knowledge about the task. One way to achieve this task is by means of a projection step at test time after unconstrained training. However, this is an expensive operation. By directly incorporating the constraints into the architecture, we can significantly speed-up inference at test time; for instance, our experiments show a speed-up of up to two orders of magnitude over a projection method. Our algorithm computes a suitable parameterization of the feasible set at initialization and uses standard variants of stochastic gradient descent to find solutions to the constrained network. Thus, the modeling constraints are always satisfied during training. Crucially, our approach avoids to solve an optimization problem at each training step or to manually trade-off data and constraint fidelity with additional hyperparameters. We consider constrained generative modeling as an important application domain and experimentally demonstrate the proposed method by constraining a variational autoencoder.


قيم البحث

اقرأ أيضاً

Understanding the loss surface of a neural network is fundamentally important to the understanding of deep learning. This paper presents how piecewise linear activation functions substantially shape the loss surfaces of neural networks. We first prov e that {it the loss surfaces of many neural networks have infinite spurious local minima} which are defined as the local minima with higher empirical risks than the global minima. Our result demonstrates that the networks with piecewise linear activations possess substantial differences to the well-studied linear neural networks. This result holds for any neural network with arbitrary depth and arbitrary piecewise linear activation functions (excluding linear functions) under most loss functions in practice. Essentially, the underlying assumptions are consistent with most practical circumstances where the output layer is narrower than any hidden layer. In addition, the loss surface of a neural network with piecewise linear activations is partitioned into multiple smooth and multilinear cells by nondifferentiable boundaries. The constructed spurious local minima are concentrated in one cell as a valley: they are connected with each other by a continuous path, on which empirical risk is invariant. Further for one-hidden-layer networks, we prove that all local minima in a cell constitute an equivalence class; they are concentrated in a valley; and they are all global minima in the cell.
82 - Rui Zhu , Bo Lin , Haixu Tang 2020
The number of linear regions is one of the distinct properties of the neural networks using piecewise linear activation functions such as ReLU, comparing with those conventional ones using other activation functions. Previous studies showed this prop erty reflected the expressivity of a neural network family ([14]); as a result, it can be used to characterize how the structural complexity of a neural network model affects the function it aims to compute. Nonetheless, it is challenging to directly compute the number of linear regions; therefore, many researchers focus on estimating the bounds (in particular the upper bound) of the number of linear regions for deep neural networks using ReLU. These methods, however, attempted to estimate the upper bound in the entire input space. The theoretical methods are still lacking to estimate the number of linear regions within a specific area of the input space, e.g., a sphere centered at a training data point such as an adversarial example or a backdoor trigger. In this paper, we present the first method to estimate the upper bound of the number of linear regions in any sphere in the input space of a given ReLU neural network. We implemented the method, and computed the bounds in deep neural networks using the piece-wise linear active function. Our experiments showed that, while training a neural network, the boundaries of the linear regions tend to move away from the training data points. In addition, we observe that the spheres centered at the training data points tend to contain more linear regions than any arbitrary points in the input space. To the best of our knowledge, this is the first study of bounding linear regions around a specific data point. We consider our work as a first step toward the investigation of the structural complexity of deep neural networks in a specific input area.
This work views neural networks as data generating systems and applies anomalous pattern detection techniques on that data in order to detect when a network is processing an anomalous input. Detecting anomalies is a critical component for multiple ma chine learning problems including detecting adversarial noise. More broadly, this work is a step towards giving neural networks the ability to recognize an out-of-distribution sample. This is the first work to introduce Subset Scanning methods from the anomalous pattern detection domain to the task of detecting anomalous input of neural networks. Subset scanning treats the detection problem as a search for the most anomalous subset of node activations (i.e., highest scoring subset according to non-parametric scan statistics). Mathematical properties of these scoring functions allow the search to be completed in log-linear rather than exponential time while still guaranteeing the most anomalous subset of nodes in the network is identified for a given input. Quantitative results for detecting and characterizing adversarial noise are provided for CIFAR-10 images on a simple convolutional neural network. We observe an interference pattern where anomalous activations in shallow layers suppress the activation structure of the original image in deeper layers.
Training convolutional neural network models is memory intensive since back-propagation requires storing activations of all intermediate layers. This presents a practical concern when seeking to deploy very deep architectures in production, especiall y when models need to be frequently re-trained on updated datasets. In this paper, we propose a new implementation for back-propagation that significantly reduces memory usage, by enabling the use of approximations with negligible computational cost and minimal effect on training performance. The algorithm reuses common buffers to temporarily store full activations and compute the forward pass exactly. It also stores approximate per-layer copies of activations, at significant memory savings, that are used in the backward pass. Compared to simply approximating activations within standard back-propagation, our method limits accumulation of errors across layers. This allows the use of much lower-precision approximations without affecting training accuracy. Experiments on CIFAR-10, CIFAR-100, and ImageNet show that our method yields performance close to exact training, while storing activations compactly with as low as 4-bit precision.
The distribution of a neural networks latent representations has been successfully used to detect out-of-distribution (OOD) data. This work investigates whether this distribution moreover correlates with a models epistemic uncertainty, thus indicates its ability to generalise to novel inputs. We first empirically verify that epistemic uncertainty can be identified with the surprise, thus the negative log-likelihood, of observing a particular latent representation. Moreover, we demonstrate that the output-conditional distribution of hidden representations also allows quantifying aleatoric uncertainty via the entropy of the predictive distribution. We analyse epistemic and aleatoric uncertainty inferred from the representations of different layers and conclude that deeper layers lead to uncertainty with similar behaviour as established - but computationally more expensive - methods (e.g. deep ensembles). While our approach does not require modifying the training process, we follow prior work and experiment with an additional regularising loss that increases the information in the latent representations. We find that this leads to improved OOD detection of epistemic uncertainty at the cost of ambiguous calibration close to the data distribution. We verify our findings on both classification and regression models.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا