ﻻ يوجد ملخص باللغة العربية
Deep neural network (DNN) generally takes thousands of iterations to optimize via gradient descent and thus has a slow convergence. In addition, softmax, as a decision layer, may ignore the distribution information of the data during classification. Aiming to tackle the referred problems, we propose a novel manifold neural network based on non-gradient optimization, i.e., the closed-form solutions. Considering that the activation function is generally invertible, we reconstruct the network via forward ridge regression and low rank backward approximation, which achieve the rapid convergence. Moreover, by unifying the flexible Stiefel manifold and adaptive support vector machine, we devise the novel decision layer which efficiently fits the manifold structure of the data and label information. Consequently, a jointly non-gradient optimization method is designed to generate the network with closed-form results. Eventually, extensive experiments validate the superior performance of the model.
Deep Neural Networks (DNNs) have become increasingly popular in computer vision, natural language processing, and other areas. However, training and fine-tuning a deep learning model is computationally intensive and time-consuming. We propose a new m
Gradient descent finds a global minimum in training deep neural networks despite the objective function being non-convex. The current paper proves gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural ne
Over-parameterization of neural networks benefits the optimization and generalization yet brings cost in practice. Pruning is adopted as a post-processing solution to this problem, which aims to remove unnecessary parameters in a neural network with
Neural architecture search (NAS) with an accuracy predictor that predicts the accuracy of candidate architectures has drawn increasing attention due to its simplicity and effectiveness. Previous works usually employ neural network-based predictors wh
Several recent works [40, 24] observed an interesting phenomenon in neural network pruning: A larger finetuning learning rate can improve the final performance significantly. Unfortunately, the reason behind it remains elusive up to date. This paper