ﻻ يوجد ملخص باللغة العربية
Weight initialization plays an important role in training neural networks and also affects tremendous deep learning applications. Various weight initialization strategies have already been developed for different activation functions with different neural networks. These initialization algorithms are based on minimizing the variance of the parameters between layers and might still fail when neural networks are deep, e.g., dying ReLU. To address this challenge, we study neural networks from a nonlinear computation point of view and propose a novel weight initialization strategy that is based on the linear product structure (LPS) of neural networks. The proposed strategy is derived from the polynomial approximation of activation functions by using theories of numerical algebraic geometry to guarantee to find all the local minima. We also provide a theoretical analysis that the LPS initialization has a lower probability of dying ReLU comparing to other existing initialization strategies. Finally, we test the LPS initialization algorithm on both fully connected neural networks and convolutional neural networks to show its feasibility, efficiency, and robustness on public datasets.
Echo State Networks (ESNs) are a special type of recurrent neural networks (RNNs), in which the input and recurrent connections are traditionally generated randomly, and only the output weights are trained. Despite the recent success of ESNs in vario
We study gradient-based regularization methods for neural networks. We mainly focus on two regularization methods: the total variation and the Tikhonov regularization. Applying these methods is equivalent to using neural networks to solve some partia
This paper presents a general framework for norm-based capacity control for $L_{p,q}$ weight normalized deep neural networks. We establish the upper bound on the Rademacher complexities of this family. With an $L_{p,q}$ normalization where $qle p^*$,
This paper concerns the minimax center of a collection of linear subspaces. When the subspaces are $k$-dimensional subspaces of $mathbb{R}^n$, this can be cast as finding the center of a minimum enclosing ball on a Grassmann manifold, Gr$(k,n)$. For
Structured CNN designed using the prior information of problems potentially improves efficiency over conventional CNNs in various tasks in solving PDEs and inverse problems in signal processing. This paper introduces BNet2, a simplified Butterfly-Net