ترغب بنشر مسار تعليمي؟ اضغط هنا

Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality Regularization and Singular Value Sparsification

359   0   0.0 ( 0 )
 نشر من قبل Huanrui Yang
 تاريخ النشر 2020
والبحث باللغة English




اسأل ChatGPT حول البحث

Modern deep neural networks (DNNs) often require high memory consumption and large computational loads. In order to deploy DNN algorithms efficiently on edge or mobile devices, a series of DNN compression algorithms have been explored, including factorization methods. Factorization methods approximate the weight matrix of a DNN layer with the multiplication of two or multiple low-rank matrices. However, it is hard to measure the ranks of DNN layers during the training process. Previous works mainly induce low-rank through implicit approximations or via costly singular value decomposition (SVD) process on every training step. The former approach usually induces a high accuracy loss while the latter has a low efficiency. In this work, we propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step. SVD training first decomposes each layer into the form of its full-rank SVD, then performs training directly on the decomposed weights. We add orthogonality regularization to the singular vectors, which ensure the valid form of SVD and avoid gradient vanishing/exploding. Low-rank is encouraged by applying sparsity-inducing regularizers on the singular values of each layer. Singular value pruning is applied at the end to explicitly reach a low-rank model. We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy, comparing to not only previous factorization methods but also state-of-the-art filter pruning methods.

قيم البحث

اقرأ أيضاً

58 - Masahiro Tanaka 2020
This study proposes a novel hierarchical prior for inferring possibly low-rank matrices measured with noise. We consider three-component matrix factorization, as in singular value decomposition, and its fully Bayesian inference. The proposed prior is specified by a scale mixture of exponential distributions that has spike and slab components. The weights for the spike/slab parts are inferred using a special prior based on a cumulative shrinkage process. The proposed prior is designed to increasingly aggressively push less important, or essentially redundant, singular values toward zero, leading to more accurate estimates of low-rank matrices. To ensure the parameter identification, we simulate posterior draws from an approximated posterior, in which the constraints are slightly relaxed, using a No-U-Turn sampler. By means of a set of simulation studies, we show that our proposal is competitive with alternative prior specifications and that it does not incur significant additional computational burden. We apply the proposed approach to sectoral industrial production in the United States to analyze the structural change during the Great Moderation period.
Quaternion matrix approximation problems construct the approximated matrix via the quaternion singular value decomposition (SVD) by selecting some singular value decomposition (SVD) triplets of quaternion matrices. In applications such as color image processing and recognition problems, only a small number of dominant SVD triplets are selected, while in some applications such as quaternion total least squares problem, small SVD triplets (small singular values and associated singular vectors) and numerical rank with respect to a small threshold are required. In this paper, we propose a randomized quaternion SVD (verbrandsvdQ) method to compute a small number of SVD triplets of a large-scale quaternion matrix. Theoretical results are given about approximation errors and the corresponding algorithm adapts to the low-rank matrix approximation problem. When the restricted rank increases, it might lead to information loss of small SVD triplets. The blocked quaternion randomized SVD algorithm is then developed when the numerical rank and information about small singular values are required. For color face recognition problems, numerical results show good performance of the developed quaternion randomized SVD method for low-rank approximation of a large-scale quaternion matrix. The blocked randomized SVD algorithm is also shown to be more robust than unblocked method through several experiments, and approximation errors from the blocked scheme are very close to the optimal error obtained by truncating a full SVD.
203 - Soufiane Belharbi 2018
Neural network models and deep models are one of the leading and state of the art models in machine learning. Most successful deep neural models are the ones with many layers which highly increases their number of parameters. Training such models req uires a large number of training samples which is not always available. One of the fundamental issues in neural networks is overfitting which is the issue tackled in this thesis. Such problem often occurs when the training of large models is performed using few training samples. Many approaches have been proposed to prevent the network from overfitting and improve its generalization performance such as data augmentation, early stopping, parameters sharing, unsupervised learning, dropout, batch normalization, etc. In this thesis, we tackle the neural network overfitting issue from a representation learning perspective by considering the situation where few training samples are available which is the case of many real world applications. We propose three contributions. The first one presented in chapter 2 is dedicated to dealing with structured output problems to perform multivariate regression when the output variable y contains structural dependencies between its components. The second contribution described in chapter 3 deals with the classification task where we propose to exploit prior knowledge about the internal representation of the hidden layers in neural networks. Our last contribution presented in chapter 4 showed the interest of transfer learning in applications where only few samples are available. In this contribution, we provide an automatic system based on such learning scheme with an application to medical domain. In this application, the task consists in localizing the third lumbar vertebra in a 3D CT scan. This work has been done in collaboration with the clinic Rouen Henri Becquerel Center who provided us with data.
145 - Xinping Yi 2020
In convolutional neural networks, the linear transformation of multi-channel two-dimensional convolutional layers with linear convolution is a block matrix with doubly Toeplitz blocks. Although a wrapping around operation can transform linear convolu tion to a circular one, by which the singular values can be approximated with reduced computational complexity by those of a block matrix with doubly circulant blocks, the accuracy of such an approximation is not guaranteed. In this paper, we propose to inspect such a linear transformation matrix through its asymptotic spectral representation - the spectral density matrix - by which we develop a simple singular value approximation method with improved accuracy over the circular approximation, as well as upper bounds for spectral norm with reduced computational complexity. Compared with the circular approximation, we obtain moderate improvement with a subtle adjustment of the singular value distribution. We also demonstrate that the spectral norm upper bounds are effective spectral regularizers for improving generalization performance in ResNets.
A deep neural network model is a powerful framework for learning representations. Usually, it is used to learn the relation $x to y$ by exploiting the regularities in the input $x$. In structured output prediction problems, $y$ is multi-dimensional a nd structural relations often exist between the dimensions. The motivation of this work is to learn the output dependencies that may lie in the output data in order to improve the prediction accuracy. Unfortunately, feedforward networks are unable to exploit the relations between the outputs. In order to overcome this issue, we propose in this paper a regularization scheme for training neural networks for these particular tasks using a multi-task framework. Our scheme aims at incorporating the learning of the output representation $y$ in the training process in an unsupervised fashion while learning the supervised mapping function $x to y$. We evaluate our framework on a facial landmark detection problem which is a typical structured output task. We show over two public challenging datasets (LFPW and HELEN) that our regularization scheme improves the generalization of deep neural networks and accelerates their training. The use of unlabeled data and label-only data is also explored, showing an additional improvement of the results. We provide an opensource implementation (https://github.com/sbelharbi/structured-output-ae) of our framework.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا