No Arabic abstract
Artificial neural networks, one of the most successful approaches to supervised learning, were originally inspired by their biological counterparts. However, the most successful learning algorithm for artificial neural networks, backpropagation, is considered biologically implausible. We contribute to the topic of biologically plausible neuronal learning by building upon and extending the equilibrium propagation learning framework. Specifically, we introduce: a new neuronal dynamics and learning rule for arbitrary network architectures; a sparsity-inducing method able to prune irrelevant connections; a dynamical-systems characterization of the models, using Lyapunov theory.
Recurrent neural networks (RNNs) are notoriously difficult to train. When the eigenvalues of the hidden to hidden weight matrix deviate from absolute value 1, optimization becomes difficult due to the well studied issue of vanishing and exploding gradients, especially when trying to learn long-term dependencies. To circumvent this problem, we propose a new architecture that learns a unitary weight matrix, with eigenvalues of absolute value exactly 1. The challenge we address is that of parametrizing unitary matrices in a way that does not require expensive computations (such as eigendecomposition) after each weight update. We construct an expressive unitary weight matrix by composing several structured matrices that act as building blocks with parameters to be learned. Optimization with this parameterization becomes feasible only when considering hidden states in the complex domain. We demonstrate the potential of this architecture by achieving state of the art results in several hard tasks involving very long-term dependencies.
Deep Neural Networks (DNNs) have emerged as a powerful mechanism and are being increasingly deployed in real-world safety-critical domains. Despite the widespread success, their complex architecture makes proving any formal guarantees about them difficult. Identifying how logical notions of high-level correctness relate to the complex low-level network architecture is a significant challenge. In this project, we extend the ideas presented in and introduce a way to bridge the gap between the architecture and the high-level specifications. Our key insight is that instead of directly proving the safety properties that are required, we first prove properties that relate closely to the structure of the neural net and use them to reason about the safety properties. We build theoretical foundations for our approach, and empirically evaluate the performance through various experiments, achieving promising results than the existing approach by identifying a larger region of input space that guarantees a certain property on the output.
Recent advances in deep learning have made available large, powerful convolutional neural networks (CNN) with state-of-the-art performance in several real-world applications. Unfortunately, these large-sized models have millions of parameters, thus they are not deployable on resource-limited platforms (e.g. where RAM is limited). Compression of CNNs thereby becomes a critical problem to achieve memory-efficient and possibly computationally faster model representations. In this paper, we investigate the impact of lossy compression of CNNs by weight pruning and quantization, and lossless weight matrix representations based on source coding. We tested several combinations of these techniques on four benchmark datasets for classification and regression problems, achieving compression rates up to $165$ times, while preserving or improving the model performance.
The quest for biologically plausible deep learning is driven, not just by the desire to explain experimentally-observed properties of biological neural networks, but also by the hope of discovering more efficient methods for training artificial networks. In this paper, we propose a new algorithm named Variational Probably Flow (VPF), an extension of minimum probability flow for training binary Deep Boltzmann Machines (DBMs). We show that weight updates in VPF are local, depending only on the states and firing rates of the adjacent neurons. Unlike contrastive divergence, there is no need for Gibbs confabulations; and unlike backpropagation, alternating feedforward and feedback phases are not required. Moreover, the learning algorithm is effective for training DBMs with intra-layer connections between the hidden nodes. Experiments with MNIST and Fashion MNIST demonstrate that VPF learns reasonable features quickly, reconstructs corrupted images more accurately, and generates samples with a high estimated log-likelihood. Lastly, we note that, interestingly, if an asymmetric version of VPF exists, the weight updates directly explain experimental results in Spike-Timing-Dependent Plasticity (STDP).
Implicit equilibrium models, i.e., deep neural networks (DNNs) defined by implicit equations, have been becoming more and more attractive recently. In this paper, we investigate an emerging question: can an implicit equilibrium models equilibrium point be regarded as the solution of an optimization problem? To this end, we first decompose DNNs into a new class of unit layer that is the proximal operator of an implicit convex function while keeping its output unchanged. Then, the equilibrium model of the unit layer can be derived, named Optimization Induced Equilibrium Networks (OptEq), which can be easily extended to deep layers. The equilibrium point of OptEq can be theoretically connected to the solution of its corresponding convex optimization problem with explicit objectives. Based on this, we can flexibly introduce prior properties to the equilibrium points: 1) modifying the underlying convex problems explicitly so as to change the architectures of OptEq; and 2) merging the information into the fixed point iteration, which guarantees to choose the desired equilibrium point when the fixed point set is non-singleton. We show that deep OptEq outperforms previous implicit models even with fewer parameters. This work establishes the first step towards the optimization-guided design of deep models.