ترغب بنشر مسار تعليمي؟ اضغط هنا

How Many Samples are Needed to Estimate a Convolutional or Recurrent Neural Network?

130   0   0.0 ( 0 )
 نشر من قبل Yining Wang
 تاريخ النشر 2018
والبحث باللغة English




اسأل ChatGPT حول البحث

It is widely believed that the practical success of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) owes to the fact that CNNs and RNNs use a more compact parametric representation than their Fully-Connected Neural Network (FNN) counterparts, and consequently require fewer training examples to accurately estimate their parameters. We initiate the study of rigorously characterizing the sample-complexity of estimating CNNs and RNNs. We show that the sample-complexity to learn CNNs and RNNs scales linearly with their intrinsic dimension and this sample-complexity is much smaller than for their FNN counterparts. For both CNNs and RNNs, we also present lower bounds showing our sample complexities are tight up to logarithmic factors. Our main technical tools for deriving these results are a localized empirical process analysis and a new technical lemma characterizing the convolutional and recurrent structure. We believe that these tools may inspire further developments in understanding CNNs and RNNs.



قيم البحث

اقرأ أيضاً

From a practical perspective it is advantageous to develop experimental methods that verify entanglement in quantum states with as few measurements as possible. In this paper we investigate the minimal number of measurements needed to detect bound en tanglement in bipartite $(dtimes d)$-dimensional states, i.e. entangled states that are positive under partial transposition. In particular, we show that a class of entanglement witnesses composed of mutually unbiased bases (MUBs) can detect bound entanglement if the number of measurements is greater than $d/2+1$. This is a substantial improvement over other detection methods, requiring significantly fewer resources than either full quantum state tomography or measuring a complete set of $d+1$ MUBs. Our approach is based on a partial characterisation of the (non-)decomposability of entanglement witnesses. We show that non-decomposability is a universal property of MUBs, which holds regardless of the choice of complementary observables, and we find that both the number of measurements and the structure of the witness play an important role in the detection of bound entanglement.
This paper introduces a generalization of Convolutional Neural Networks (CNNs) from low-dimensional grid data, such as images, to graph-structured data. We propose a novel spatial convolution utilizing a random walk to uncover the relations within th e input, analogous to the way the standard convolution uses the spatial neighborhood of a pixel on the grid. The convolution has an intuitive interpretation, is efficient and scalable and can also be used on data with varying graph structure. Furthermore, this generalization can be applied to many standard regression or classification problems, by learning the the underlying graph. We empirically demonstrate the performance of the proposed CNN on MNIST, and challenge the state-of-the-art on Merck molecular activity data set.
328 - Zhen Fu , Bo Wang , Xihong Wu 2021
The auditory attention decoding (AAD) approach was proposed to determine the identity of the attended talker in a multi-talker scenario by analyzing electroencephalography (EEG) data. Although the linear model-based method has been widely used in AAD , the linear assumption was considered oversimplified and the decoding accuracy remained lower for shorter decoding windows. Recently, nonlinear models based on deep neural networks (DNN) have been proposed to solve this problem. However, these models did not fully utilize both the spatial and temporal features of EEG, and the interpretability of DNN models was rarely investigated. In this paper, we proposed novel convolutional recurrent neural network (CRNN) based regression model and classification model, and compared them with both the linear model and the state-of-the-art DNN models. Results showed that, our proposed CRNN-based classification model outperformed others for shorter decoding windows (around 90% for 2 s and 5 s). Although worse than classification models, the decoding accuracy of the proposed CRNN-based regression model was about 5% greater than other regression models. The interpretability of DNN models was also investigated by visualizing layers weight.
Unequivocally, a single man in possession of a strong password is not enough to solve the issue of security. Studies indicate that passwords have been subjected to various attacks, regardless of the applied protection mechanisms due to the human fact or. The keystone for the adoption of more efficient authentication methods by the different markets is the trade-off between security and usability. To bridge the gap between user-friendly interfaces and advanced security features, the Fast Identity Online (FIDO) alliance defined several authentication protocols. Although FIDOs biometric-based authentication is not a novel concept, still daunts end users and developers, which may be a contributor factor obstructing FIDOs complete dominance of the digital authentication market. This paper traces the evolution of FIDO protocols, by identifying the technical characteristics and security requirements of the FIDO protocols throughout the differe
We study how neural networks trained by gradient descent extrapolate, i.e., what they learn outside the support of the training distribution. Previous works report mixed empirical results when extrapolating with neural networks: while feedforward neu ral networks, a.k.a. multilayer perceptrons (MLPs), do not extrapolate well in certain simple tasks, Graph Neural Networks (GNNs) -- structured networks with MLP modules -- have shown some success in more complex tasks. Working towards a theoretical explanation, we identify conditions under which MLPs and GNNs extrapolate well. First, we quantify the observation that ReLU MLPs quickly converge to linear functions along any direction from the origin, which implies that ReLU MLPs do not extrapolate most nonlinear functions. But, they can provably learn a linear target function when the training distribution is sufficiently diverse. Second, in connection to analyzing the successes and limitations of GNNs, these results suggest a hypothesis for which we provide theoretical and empirical evidence: the success of GNNs in extrapolating algorithmic tasks to new data (e.g., larger graphs or edge weights) relies on encoding task-specific non-linearities in the architecture or features. Our theoretical analysis builds on a connection of over-parameterized networks to the neural tangent kernel. Empirically, our theory holds across different training settings.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا