ترغب بنشر مسار تعليمي؟ اضغط هنا

Spectrum concentration in deep residual learning: a free probability approach

66   0   0.0 ( 0 )
 نشر من قبل Zenan Ling
 تاريخ النشر 2018
والبحث باللغة English




اسأل ChatGPT حول البحث

We revisit the initialization of deep residual networks (ResNets) by introducing a novel analytical tool in free probability to the community of deep learning. This tool deals with non-Hermitian random matrices, rather than their conventional Hermitian counterparts in the literature. As a consequence, this new tool enables us to evaluate the singular value spectrum of the input-output Jacobian of a fully-connected deep ResNet for both linear and nonlinear cases. With the powerful tool of free probability, we conduct an asymptotic analysis of the spectrum on the single-layer case, and then extend this analysis to the multi-layer case of an arbitrary number of layers. In particular, we propose to rescale the classical random initialization by the number of residual units, so that the spectrum has the order of $O(1)$, when compared with the large width and depth of the network. We empirically demonstrate that the proposed initialization scheme learns at a speed of orders of magnitudes faster than the classical ones, and thus attests a strong practical relevance of this investigation.



قيم البحث

اقرأ أيضاً

In this study, a novel topology optimization approach based on conditional Wasserstein generative adversarial networks (CWGAN) is developed to replicate the conventional topology optimization algorithms in an extremely computationally inexpensive way . CWGAN consists of a generator and a discriminator, both of which are deep convolutional neural networks (CNN). The limited samples of data, quasi-optimal planar structures, needed for training purposes are generated using the conventional topology optimization algorithms. With CWGANs, the topology optimization conditions can be set to a required value before generating samples. CWGAN truncates the global design space by introducing an equality constraint by the designer. The results are validated by generating an optimized planar structure using the conventional algorithms with the same settings. A proof of concept is presented which is known to be the first such illustration of fusion of CWGANs and topology optimization.
We revisit residual algorithms in both model-free and model-based reinforcement learning settings. We propose the bidirectional target network technique to stabilize residual algorithms, yielding a residual version of DDPG that significantly outperfo rms vanilla DDPG in the DeepMind Control Suite benchmark. Moreover, we find the residual algorithm an effective approach to the distribution mismatch problem in model-based planning. Compared with the existing TD($k$) method, our residual-based method makes weaker assumptions about the model and yields a greater performance boost.
Deep neural networks enjoy a powerful representation and have proven effective in a number of applications. However, recent advances show that deep neural networks are vulnerable to adversarial attacks incurred by the so-called adversarial examples. Although the adversarial example is only slightly different from the input sample, the neural network classifies it as the wrong class. In order to alleviate this problem, we propose the Deep Minimax Probability Machine (DeepMPM), which applies MPM to deep neural networks in an end-to-end fashion. In a worst-case scenario, MPM tries to minimize an upper bound of misclassification probabilities, considering the global information (i.e., mean and covariance information of each class). DeepMPM can be more robust since it learns the worst-case bound on the probability of misclassification of future data. Experiments on two real-world datasets can achieve comparable classification performance with CNN, while can be more robust on adversarial attacks.
A central tool in the study of nonhomogeneous random matrices, the noncommutative Khintchine inequality of Lust-Piquard and Pisier, yields a nonasymptotic bound on the spectral norm of general Gaussian random matrices $X=sum_i g_i A_i$ where $g_i$ ar e independent standard Gaussian variables and $A_i$ are matrix coefficients. This bound exhibits a logarithmic dependence on dimension that is sharp when the matrices $A_i$ commute, but often proves to be suboptimal in the presence of noncommutativity. In this paper, we develop nonasymptotic bounds on the spectrum of arbitrary Gaussian random matrices that can capture noncommutativity. These bounds quantify the degree to which the deterministic matrices $A_i$ behave as though they are freely independent. This intrinsic freeness phenomenon provides a powerful tool for the study of various questions that are outside the reach of classical methods of random matrix theory. Our nonasymptotic bounds are easily applicable in concrete situations, and yield sharp results in examples where the noncommutative Khintchine inequality is suboptimal. When combined with a linearization argument, our bounds imply strong asymptotic freeness (in the sense of Haagerup-Thorbj{o}rnsen) for a remarkably general class of Gaussian random matrix models, including matrices that may be very sparse and that lack any special symmetries. Beyond the Gaussian setting, we develop matrix concentration inequalities that capture noncommutativity for general sums of independent random matrices, which arise in many problems of pure and applied mathematics.
Recent focus on robustness to adversarial attacks for deep neural networks produced a large variety of algorithms for training robust models. Most of the effective algorithms involve solving the min-max optimization problem for training robust models (min step) under worst-case attacks (max step). However, they often suffer from high computational cost from running several inner maximization iterations (to find an optimal attack) inside every outer minimization iteration. Therefore, it becomes difficult to readily apply such algorithms for moderate to large size real world data sets. To alleviate this, we explore the effectiveness of iterative descent-ascent algorithms where the maximization and minimization steps are executed in an alternate fashion to simultaneously obtain the worst-case attack and the corresponding robust model. Specifically, we propose a novel discrete-time dynamical system-based algorithm that aims to find the saddle point of a min-max optimization problem in the presence of uncertainties. Under the assumptions that the cost function is convex and uncertainties enter concavely in the robust learning problem, we analytically show that our algorithm converges asymptotically to the robust optimal solution under a general adversarial budget constraints as induced by $ell_p$ norm, for $1leq pleq infty$. Based on our proposed analysis, we devise a fast robust training algorithm for deep neural networks. Although such training involves highly non-convex robust optimization problems, empirical results show that the algorithm can achieve significant robustness compared to other state-of-the-art robust models on benchmark data sets.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا