Max-Affine Spline Insights Into Deep Network Pruning

284 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Haoran You

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Randall Balestriero - Haoran You - Zhihan Lu

التعلم الآلي الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper, we study the importance of pruning in Deep Networks (DNs) and the yin & yang relationship between (1) pruning highly overparametrized DNs that have been trained from random initialization and (2) training small DNs that have been cleverly initialized. As in most cases practitioners can only resort to random initialization, there is a strong need to develop a grounded understanding of DN pruning. Current literature remains largely empirical, lacking a theoretical understanding of how pruning affects DNs decision boundary, how to interpret pruning, and how to design corresponding principled pruning techniques. To tackle those questions, we propose to employ recent advances in the theoretical analysis of Continuous Piecewise Affine (CPA) DNs. From this perspective, we will be able to detect the early-bird (EB) ticket phenomenon, provide interpretability into current pruning techniques, and develop a principled pruning strategy. In each step of our study, we conduct extensive experiments supporting our claims and results; while our main goal is to enhance the current understanding towards DN pruning instead of developing a new pruning method, our spline pruning criteria in terms of layerwise and global pruning is on par with or even outperforms state-of-the-art pruning methods.

قيم البحث

146 - Huan Wang , Can Qin , Yulun Zhang 2021

Over-parameterization of neural networks benefits the optimization and generalization yet brings cost in practice. Pruning is adopted as a post-processing solution to this problem, which aims to remove unnecessary parameters in a neural network with little performance compromised. It has been broadly believed the resulted sparse neural network cannot be trained from scratch to comparable accuracy. However, several recent works (e.g., [Frankle and Carbin, 2019a]) challenge this belief by discovering random sparse networks which can be trained to match the performance with their dense counterpart. This new pruning paradigm later inspires more new methods of pruning at initialization. In spite of the encouraging progress, how to coordinate these new pruning fashions with the traditional pruning has not been explored yet. This survey seeks to bridge the gap by proposing a general pruning framework so that the emerging pruning paradigms can be accommodated well with the traditional one. With it, we systematically reflect the major differences and new insights brought by these new pruning fashions, with representative works discussed at length. Finally, we summarize the open questions as worthy future directions.

التعلم الآلي الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

Structured Deep Neural Network Pruning via Matrix Pivoting

261 - Ranko Sredojevic , Shaoyi Cheng , Lazar Supic 2017

Deep Neural Networks (DNNs) are the key to the state-of-the-art machine vision, sensor fusion and audio/video signal processing. Unfortunately, their computation complexity and tight resource constraints on the Edge make them hard to leverage on mobi le, embedded and IoT devices. Due to great diversity of Edge devices, DNN designers have to take into account the hardware platform and application requirements during network training. In this work we introduce pruning via matrix pivoting as a way to improve network pruning by compromising between the design flexibility of architecture-oblivious and performance efficiency of architecture-aware pruning, the two dominant techniques for obtaining resource-efficient DNNs. We also describe local and global network optimization techniques for efficient implementation of the resulting pruned networks. In combination, the proposed pruning and implementation result in close to linear speed up with the reduction of network coefficients during pruning.

التعلم الآلي

Insights into exfoliation possibility of MAX phases to MXenes

118 - Mohammad Khazaei , Ahmad Ranjbar , Keivan Esfarjani 2018

Chemical exfoliation of MAX phases into two-dimensional (2D) MXenes can be considered as a major breakthrough in the synthesis of novel 2D systems. To gain insight into the exfoliation possibility of MAX phases and to identify which MAX phases are pr omising candidates for successful exfoliation into 2D MXenes, we perform extensive electronic structure and phonon calculations, and determine the force constants, bond strengths, and static exfoliation energies of MAX phases to MXenes for 82 different experimentally synthesized crystalline MAX phases. Our results show a clear correlation between the force constants and the bond strengths. As the total force constant of an A atom contributed from the neighboring atoms is smaller, the exfoliation energy becomes smaller, thus making exfoliation easier. We propose 37 MAX phases for successful exfoliation into 2D Ti$_2$C, Ti$_3$C$_2$, Ti$_4$C$_3, $Ti$_5$C$_4$, Ti$_2$N, Zr$_2$C, Hf$_2$C, V$_2$C, V$_3$C$_2$, V$_4$C$_3$, Nb$_2$C, Nb$_5$C$_4$, Ta$_2$C, Ta$_5$C$_4$, Cr$_2$C, Cr$_2$N, and Mo$_2$C MXenes. In addition, we explore the effect of charge injection on MAX phases. We find that the injected charges, both electrons and holes, are mainly received by the transition metals. This is due to the electronic property of MAX phases that the states near the Fermi energy are mainly dominated by $d$ orbitals of the transition metals. For negatively charged MAX phases, the electrons injected cause swelling of the structure and elongation of the bond distances along the $c$ axis, which hence weakens the binding. For positively charged MAX phases, on the other hand, the bonds become shorter and stronger. Therefore, we predict that the electron injection by electrochemistry or gating techniques can significantly facilitate the exfoliation possibility of MAX phases to 2D MXenes.

علم المواد

Dynamical Isometry: The Missing Ingredient for Neural Network Pruning

163 - Huan Wang , Can Qin , Yue Bai 2021

Several recent works [40, 24] observed an interesting phenomenon in neural network pruning: A larger finetuning learning rate can improve the final performance significantly. Unfortunately, the reason behind it remains elusive up to date. This paper is meant to explain it through the lens of dynamical isometry [42]. Specifically, we examine neural network pruning from an unusual perspective: pruning as initialization for finetuning, and ask whether the inherited weights serve as a good initialization for the finetuning? The insights from dynamical isometry suggest a negative answer. Despite its critical role, this issue has not been well-recognized by the community so far. In this paper, we will show the understanding of this problem is very important -- on top of explaining the aforementioned mystery about the larger finetuning rate, it also unveils the mystery about the value of pruning [5, 30]. Besides a clearer theoretical understanding of pruning, resolving the problem can also bring us considerable performance benefits in practice.

التعلم الآلي الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

Adversarial Examples on Graph Data: Deep Insights into Attack and Defense

258 - Huijun Wu , Chen Wang , Yuriy Tyshetskiy 2019

Graph deep learning models, such as graph convolutional networks (GCN) achieve remarkable performance for tasks on graph data. Similar to other types of deep models, graph deep learning models often suffer from adversarial attacks. However, compared with non-graph data, the discrete features, graph connections and different definitions of imperceptible perturbations bring unique challenges and opportunities for the adversarial attacks and defenses for graph data. In this paper, we propose both attack and defense techniques. For attack, we show that the discreteness problem could easily be resolved by introducing integrated gradients which could accurately reflect the effect of perturbing certain features or edges while still benefiting from the parallel computations. For defense, we observe that the adversarially manipulated graph for the targeted attack differs from normal graphs statistically. Based on this observation, we propose a defense approach which inspects the graph and recovers the potential adversarial perturbations. Our experiments on a number of datasets show the effectiveness of the proposed methods.

التعلم الآلي التشفير والأمن التعلم الالي