ﻻ يوجد ملخص باللغة العربية
In Neural Networks (NN), Adaptive Activation Functions (AAF) have parameters that control the shapes of activation functions. These parameters are trained along with other parameters in the NN. AAFs have improved performance of Neural Networks (NN) in multiple classification tasks. In this paper, we propose and apply AAFs on feedforward NNs for regression tasks. We argue that applying AAFs in the regression (second-to-last) layer of a NN can significantly decrease the bias of the regression NN. However, using existing AAFs may lead to overfitting. To address this problem, we propose a Smooth Adaptive Activation Function (SAAF) with piecewise polynomial form which can approximate any continuous function to arbitrary degree of error. NNs with SAAFs can avoid overfitting by simply regularizing the parameters. In particular, an NN with SAAFs is Lipschitz continuous given a bounded magnitude of the NN parameters. We prove an upper-bound for model complexity in terms of fat-shattering dimension for any Lipschitz continuous regression model. Thus, regularizing the parameters in NNs with SAAFs avoids overfitting. We empirically evaluated NNs with SAAFs and achieved state-of-the-art results on multiple regression datasets.
We have proposed orthogonal-Pade activation functions, which are trainable activation functions and show that they have faster learning capability and improves the accuracy in standard deep learning datasets and models. Based on our experiments, we h
Implicitly defined, continuous, differentiable signal representations parameterized by neural networks have emerged as a powerful paradigm, offering many possible benefits over conventional representations. However, current network architectures for
The scope of research in the domain of activation functions remains limited and centered around improving the ease of optimization or generalization quality of neural networks (NNs). However, to develop a deeper understanding of deep learning, it bec
Neural networks are generally built by interleaving (adaptable) linear layers with (fixed) nonlinear activation functions. To increase their flexibility, several authors have proposed methods for adapting the activation functions themselves, endowing
We study the dynamics of optimization and the generalization properties of one-hidden layer neural networks with quadratic activation function in the over-parametrized regime where the layer width $m$ is larger than the input dimension $d$. We cons