No Arabic abstract
The activation function plays a fundamental role in the artificial neural network learning process. However, there is no obvious choice or procedure to determine the best activation function, which depends on the problem. This study proposes a new artificial neuron, named global-local neuron, with a trainable activation function composed of two components, a global and a local. The global component term used here is relative to a mathematical function to describe a general feature present in all problem domain. The local component is a function that can represent a localized behavior, like a transient or a perturbation. This new neuron can define the importance of each activation function component in the learning phase. Depending on the problem, it results in a purely global, or purely local, or a mixed global and local activation function after the training phase. Here, the trigonometric sine function was employed for the global component and the hyperbolic tangent for the local component. The proposed neuron was tested for problems where the target was a purely global function, or purely local function, or a composition of two global and local functions. Two classes of test problems were investigated, regression problems and differential equations solving. The experimental tests demonstrated the Global-Local Neuron networks superior performance, compared with simple neural networks with sine or hyperbolic tangent activation function, and with a hybrid network that combines these two simple neural networks.
An activation function is a crucial component of a neural network that introduces non-linearity in the network. The state-of-the-art performance of a neural network depends on the perfect choice of an activation function. We propose two novel non-monotonic smooth trainable activation functions, called ErfAct-1 and ErfAct-2. Experiments suggest that the proposed functions improve the network performance significantly compared to the widely used activations like ReLU, Swish, and Mish. Replacing ReLU by ErfAct-1 and ErfAct-2, we have 5.21% and 5.04% improvement for top-1 accuracy on PreactResNet-34 network in CIFAR100 dataset, 2.58% and 2.76% improvement for top-1 accuracy on PreactResNet-34 network in CIFAR10 dataset, 1.0%, and 1.0% improvement on mean average precision (mAP) on SSD300 model in Pascal VOC dataset.
We have proposed orthogonal-Pade activation functions, which are trainable activation functions and show that they have faster learning capability and improves the accuracy in standard deep learning datasets and models. Based on our experiments, we have found two best candidates out of six orthogonal-Pade activations, which we call safe Hermite-Pade (HP) activation functions, namely HP-1 and HP-2. When compared to ReLU, HP-1 and HP-2 has an increment in top-1 accuracy by 5.06% and 4.63% respectively in PreActResNet-34, by 3.02% and 2.75% respectively in MobileNet V2 model on CIFAR100 dataset while on CIFAR10 dataset top-1 accuracy increases by 2.02% and 1.78% respectively in PreActResNet-34, by 2.24% and 2.06% respectively in LeNet, by 2.15% and 2.03% respectively in Efficientnet B0.
Continual learning is a concept of online learning with multiple sequential tasks. One of the critical barriers of continual learning is that a network should learn a new task keeping the knowledge of old tasks without access to any data of the old tasks. In this paper, we propose a neuron activation importance-based regularization method for stable continual learning regardless of the order of tasks. We conduct comprehensive experiments on existing benchmark data sets to evaluate not just the stability and plasticity of our method with improved classification accuracy also the robustness of the performance along the changes of task order.
Significant computational cost and memory requirements for deep neural networks (DNNs) make it difficult to utilize DNNs in resource-constrained environments. Binary neural network (BNN), which uses binary weights and binary activations, has been gaining interests for its hardware-friendly characteristics and minimal resource requirement. However, BNN usually suffers from accuracy degradation. In this paper, we introduce BitSplit-Net, a neural network which maintains the hardware-friendly characteristics of BNN while improving accuracy by using multi-bit precision. In BitSplit-Net, each bit of multi-bit activations propagates independently throughout the network before being merged at the end of the network. Thus, each bit path of the BitSplit-Net resembles BNN and hardware friendly features of BNN, such as bitwise binary activation function, are preserved in our scheme. We demonstrate that the BitSplit version of LeNet-5, VGG-9, AlexNet, and ResNet-18 can be trained to have similar classification accuracy at a lower computational cost compared to conventional multi-bit networks with low bit precision (<= 4-bit). We further evaluate BitSplit-Net on GPU with custom CUDA kernel, showing that BitSplit-Net can achieve better hardware performance in comparison to conventional multi-bit networks.
In this paper, we propose LGG, a neurologically inspired graph neural network, to learn local-global-graph representations from Electroencephalography (EEG) for a Brain-Computer Interface (BCI). A temporal convolutional layer with multi-scale 1D convolutional kernels and kernel-level attention fusion is proposed to learn the temporal dynamics of EEG. Inspired by neurological knowledge of cognitive processes in the brain, we propose local and global graph-filtering layers to learn the brain activities within and between different functional areas of the brain to model the complex relations among them during the cognitive processes. Under the robust nested cross-validation settings, the proposed method is evaluated on the publicly available dataset DEAP, and the classification performance is compared with state-of-the-art methods, such as FBFgMDM, FBTSC, Unsupervised learning, DeepConvNet, ShallowConvNet, EEGNet, and TSception. The results show that the proposed method outperforms all these state-of-the-art methods, and the improvements are statistically significant (p<0.05) in most cases. The source code can be found at: https://github.com/yi-ding-cs/LGG