No Arabic abstract
Echo State Networks (ESNs) are a special type of recurrent neural networks (RNNs), in which the input and recurrent connections are traditionally generated randomly, and only the output weights are trained. Despite the recent success of ESNs in various tasks of audio, image and radar recognition, we postulate that a purely random initialization is not the ideal way of initializing ESNs. The aim of this work is to propose an unsupervised initialization of the input connections using the K-Means algorithm on the training data. We show that for a large variety of datasets this initialization performs equivalently or superior than a randomly initialized ESN whilst needing significantly less reservoir neurons. Furthermore, we discuss that this approach provides the opportunity to estimate a suitable size of the reservoir based on prior knowledge about the data.
Weight initialization plays an important role in training neural networks and also affects tremendous deep learning applications. Various weight initialization strategies have already been developed for different activation functions with different neural networks. These initialization algorithms are based on minimizing the variance of the parameters between layers and might still fail when neural networks are deep, e.g., dying ReLU. To address this challenge, we study neural networks from a nonlinear computation point of view and propose a novel weight initialization strategy that is based on the linear product structure (LPS) of neural networks. The proposed strategy is derived from the polynomial approximation of activation functions by using theories of numerical algebraic geometry to guarantee to find all the local minima. We also provide a theoretical analysis that the LPS initialization has a lower probability of dying ReLU comparing to other existing initialization strategies. Finally, we test the LPS initialization algorithm on both fully connected neural networks and convolutional neural networks to show its feasibility, efficiency, and robustness on public datasets.
Cancer is still one of the most devastating diseases of our time. One way of automatically classifying tumor samples is by analyzing its derived molecular information (i.e., its genes expression signatures). In this work, we aim to distinguish three different types of cancer: thyroid, skin, and stomach. For that, we compare the performance of a Denoising Autoencoder (DAE) used as weight initialization of a deep neural network. Although we address a different domain problem in this work, we have adopted the same methodology of Ferreira et al.. In our experiments, we assess two different approaches when training the classification model: (a) fixing the weights, after pre-training the DAE, and (b) allowing fine-tuning of the entire classification network. Additionally, we apply two different strategies for embedding the DAE into the classification network: (1) by only importing the encoding layers, and (2) by inserting the complete autoencoder. Our best result was the combination of unsupervised feature learning through a DAE, followed by its full import into the classification network, and subsequent fine-tuning through supervised training, achieving an F1 score of 98.04% +/- 1.09 when identifying cancerous thyroid samples.
In this paper, the echo state network (ESN) memory capacity, which represents the amount of input data an ESN can store, is analyzed for a new type of deep ESNs. In particular, two deep ESN architectures are studied. First, a parallel deep ESN is proposed in which multiple reservoirs are connected in parallel allowing them to average outputs of multiple ESNs, thus decreasing the prediction error. Then, a series architecture ESN is proposed in which ESN reservoirs are placed in cascade that the output of each ESN is the input of the next ESN in the series. This series ESN architecture can capture more features between the input sequence and the output sequence thus improving the overall prediction accuracy. Fundamental analysis shows that the memory capacity of parallel ESNs is equivalent to that of a traditional shallow ESN, while the memory capacity of series ESNs is smaller than that of a traditional shallow ESN.In terms of normalized root mean square error, simulation results show that the parallel deep ESN achieves 38.5% reduction compared to the traditional shallow ESN while the series deep ESN achieves 16.8% reduction.
Prediction based on Irregularly Sampled Time Series (ISTS) is of wide concern in the real-world applications. For more accurate prediction, the methods had better grasp more data characteristics. Different from ordinary time series, ISTS is characterised with irregular time intervals of intra-series and different sampling rates of inter-series. However, existing methods have suboptimal predictions due to artificially introducing new dependencies in a time series and biasedly learning relations among time series when modeling these two characteristics. In this work, we propose a novel Time Encoding (TE) mechanism. TE can embed the time information as time vectors in the complex domain. It has the the properties of absolute distance and relative distance under different sampling rates, which helps to represent both two irregularities of ISTS. Meanwhile, we create a new model structure named Time Encoding Echo State Network (TE-ESN). It is the first ESNs-based model that can process ISTS data. Besides, TE-ESN can incorporate long short-term memories and series fusion to grasp horizontal and vertical relations. Experiments on one chaos system and three real-world datasets show that TE-ESN performs better than all baselines and has better reservoir property.
Since their inception, learning techniques under the Reservoir Computing paradigm have shown a great modeling capability for recurrent systems without the computing overheads required for other approaches. Among them, different flavors of echo state networks have attracted many stares through time, mainly due to the simplicity and computational efficiency of their learning algorithm. However, these advantages do not compensate for the fact that echo state networks remain as black-box models whose decisions cannot be easily explained to the general audience. This work addresses this issue by conducting an explainability study of Echo State Networks when applied to learning tasks with time series, image and video data. Specifically, the study proposes three different techniques capable of eliciting understandable information about the knowledge grasped by these recurrent models, namely, potential memory, temporal patterns and pixel absence effect. Potential memory addresses questions related to the effect of the reservoir size in the capability of the model to store temporal information, whereas temporal patterns unveils the recurrent relationships captured by the model over time. Finally, pixel absence effect attempts at evaluating the effect of the absence of a given pixel when the echo state network model is used for image and video classification. We showcase the benefits of our proposed suite of techniques over three different domains of applicability: time series modeling, image and, for the first time in the related literature, video classification. Our results reveal that the proposed techniques not only allow for a informed understanding of the way these models work, but also serve as diagnostic tools capable of detecting issues inherited from data (e.g. presence of hidden bias).