ترغب بنشر مسار تعليمي؟ اضغط هنا

Bounding The Number of Linear Regions in Local Area for Neural Networks with ReLU Activations

83   0   0.0 ( 0 )
 نشر من قبل Bo Lin
 تاريخ النشر 2020
والبحث باللغة English




اسأل ChatGPT حول البحث

The number of linear regions is one of the distinct properties of the neural networks using piecewise linear activation functions such as ReLU, comparing with those conventional ones using other activation functions. Previous studies showed this property reflected the expressivity of a neural network family ([14]); as a result, it can be used to characterize how the structural complexity of a neural network model affects the function it aims to compute. Nonetheless, it is challenging to directly compute the number of linear regions; therefore, many researchers focus on estimating the bounds (in particular the upper bound) of the number of linear regions for deep neural networks using ReLU. These methods, however, attempted to estimate the upper bound in the entire input space. The theoretical methods are still lacking to estimate the number of linear regions within a specific area of the input space, e.g., a sphere centered at a training data point such as an adversarial example or a backdoor trigger. In this paper, we present the first method to estimate the upper bound of the number of linear regions in any sphere in the input space of a given ReLU neural network. We implemented the method, and computed the bounds in deep neural networks using the piece-wise linear active function. Our experiments showed that, while training a neural network, the boundaries of the linear regions tend to move away from the training data points. In addition, we observe that the spheres centered at the training data points tend to contain more linear regions than any arbitrary points in the input space. To the best of our knowledge, this is the first study of bounding linear regions around a specific data point. We consider our work as a first step toward the investigation of the structural complexity of deep neural networks in a specific input area.

قيم البحث

اقرأ أيضاً

We investigate the parameter-space geometry of recurrent neural networks (RNNs), and develop an adaptation of path-SGD optimization method, attuned to this geometry, that can learn plain RNNs with ReLU activations. On several datasets that require ca pturing long-term dependency structure, we show that path-SGD can significantly improve trainability of ReLU RNNs compared to RNNs trained with SGD, even with various recently suggested initialization schemes.
Neural networks have been widely used to solve complex real-world problems. Due to the complicate, nonlinear, non-convex nature of neural networks, formal safety guarantees for the output behaviors of neural networks will be crucial for their applica tions in safety-critical systems.In this paper, the output reachable set computation and safety verification problems for a class of neural networks consisting of Rectified Linear Unit (ReLU) activation functions are addressed. A layer-by-layer approach is developed to compute output reachable set. The computation is formulated in the form of a set of manipulations for a union of polyhedra, which can be efficiently applied with the aid of polyhedron computation tools. Based on the output reachable set computation results, the safety verification for a ReLU neural network can be performed by checking the intersections of unsafe regions and output reachable set described by a union of polyhedra. A numerical example of a randomly generated ReLU neural network is provided to show the effectiveness of the approach developed in this paper.
We present polynomial time and sample efficient algorithms for learning an unknown depth-2 feedforward neural network with general ReLU activations, under mild non-degeneracy assumptions. In particular, we consider learning an unknown network of the form $f(x) = {a}^{mathsf{T}}sigma({W}^mathsf{T}x+b)$, where $x$ is drawn from the Gaussian distribution, and $sigma(t) := max(t,0)$ is the ReLU activation. Prior works for learning networks with ReLU activations assume that the bias $b$ is zero. In order to deal with the presence of the bias terms, our proposed algorithm consists of robustly decomposing multiple higher order tensors arising from the Hermite expansion of the function $f(x)$. Using these ideas we also establish identifiability of the network parameters under minimal assumptions.
Understanding the loss surface of a neural network is fundamentally important to the understanding of deep learning. This paper presents how piecewise linear activation functions substantially shape the loss surfaces of neural networks. We first prov e that {it the loss surfaces of many neural networks have infinite spurious local minima} which are defined as the local minima with higher empirical risks than the global minima. Our result demonstrates that the networks with piecewise linear activations possess substantial differences to the well-studied linear neural networks. This result holds for any neural network with arbitrary depth and arbitrary piecewise linear activation functions (excluding linear functions) under most loss functions in practice. Essentially, the underlying assumptions are consistent with most practical circumstances where the output layer is narrower than any hidden layer. In addition, the loss surface of a neural network with piecewise linear activations is partitioned into multiple smooth and multilinear cells by nondifferentiable boundaries. The constructed spurious local minima are concentrated in one cell as a valley: they are connected with each other by a continuous path, on which empirical risk is invariant. Further for one-hidden-layer networks, we prove that all local minima in a cell constitute an equivalence class; they are concentrated in a valley; and they are all global minima in the cell.
We propose a method to impose homogeneous linear inequality constraints of the form $Axleq 0$ on neural network activations. The proposed method allows a data-driven training approach to be combined with modeling prior knowledge about the task. One w ay to achieve this task is by means of a projection step at test time after unconstrained training. However, this is an expensive operation. By directly incorporating the constraints into the architecture, we can significantly speed-up inference at test time; for instance, our experiments show a speed-up of up to two orders of magnitude over a projection method. Our algorithm computes a suitable parameterization of the feasible set at initialization and uses standard variants of stochastic gradient descent to find solutions to the constrained network. Thus, the modeling constraints are always satisfied during training. Crucially, our approach avoids to solve an optimization problem at each training step or to manually trade-off data and constraint fidelity with additional hyperparameters. We consider constrained generative modeling as an important application domain and experimentally demonstrate the proposed method by constraining a variational autoencoder.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا