How Does BN Increase Collapsed Neural Network Filters?

54 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Xinjiang Wang

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Sheng Zhou - Xinjiang Wang - Ping Luo

التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Improving sparsity of deep neural networks (DNNs) is essential for network compression and has drawn much attention. In this work, we disclose a harmful sparsifying process called filter collapse, which is common in DNNs with batch normalization (BN) and rectified linear activation functions (e.g. ReLU, Leaky ReLU). It occurs even without explicit sparsity-inducing regularizations such as $L_1$. This phenomenon is caused by the normalization effect of BN, which induces a non-trainable region in the parameter space and reduces the network capacity as a result. This phenomenon becomes more prominent when the network is trained with large learning rates (LR) or adaptive LR schedulers, and when the network is finetuned. We analytically prove that the parameters of BN tend to become sparser during SGD updates with high gradient noise and that the sparsifying probability is proportional to the square of learning rate and inversely proportional to the square of the scale parameter of BN. To prevent the undesirable collapsed filters, we propose a simple yet effective approach named post-shifted BN (psBN), which has the same representation ability as BN while being able to automatically make BN parameters trainable again as they saturate during training. With psBN, we can recover collapsed filters and increase the model performance in various tasks such as classification on CIFAR-10 and object detection on MS-COCO2017.

قيم البحث

210 - Linjun Zhang , Zhun Deng , Kenji Kawaguchi 2020

Mixup is a popular data augmentation technique based on taking convex combinations of pairs of examples and their labels. This simple technique has been shown to substantially improve both the robustness and the generalization of the trained model. H owever, it is not well-understood why such improvement occurs. In this paper, we provide theoretical analysis to demonstrate how using Mixup in training helps model robustness and generalization. For robustness, we show that minimizing the Mixup loss corresponds to approximately minimizing an upper bound of the adversarial loss. This explains why models obtained by Mixup training exhibits robustness to several kinds of adversarial attacks such as Fast Gradient Sign Method (FGSM). For generalization, we prove that Mixup augmentation corresponds to a specific type of data-adaptive regularization which reduces overfitting. Our analysis provides new insights and a framework to understand Mixup.

التعلم الآلي التعلم الالي

How Does Supernet Help in Neural Architecture Search?

223 - Yuge Zhang , Quanlu Zhang , Yaming Yang 2020

Weight sharing, as an approach to speed up architecture performance estimation has received wide attention. Instead of training each architecture separately, weight sharing builds a supernet that assembles all the architectures as its submodels. Howe ver, there has been debate over whether the NAS process actually benefits from weight sharing, due to the gap between supernet optimization and the objective of NAS. To further understand the effect of weight sharing on NAS, we conduct a comprehensive analysis on five search spaces, including NAS-Bench-101, NAS-Bench-201, DARTS-CIFAR10, DARTS-PTB, and ProxylessNAS. We find that weight sharing works well on some search spaces but fails on others. Taking a step forward, we further identified biases accounting for such phenomenon and the capacity of weight sharing. Our work is expected to inspire future NAS researchers to better leverage the power of weight sharing.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط الحوسبة العصبية والتطورية

Graph Neural Network with Automorphic Equivalence Filters

71 - Fengli Xu , Quanming Yao , Pan Hui 2020

Graph neural network (GNN) has recently been established as an effective representation learning framework on graph data. However, the popular message passing models rely on local permutation invariant aggregate functions, which gives rise to the con cerns about their representational power. Here, we introduce the concept of automorphic equivalence to theoretically analyze GNNs expressiveness in differentiating nodes structural role. We show that the existing message passing GNNs have limitations in learning expressive representations. Moreover, we design a novel GNN class that leverages learnable automorphic equivalence filters to explicitly differentiate the structural roles of each nodes neighbors, and uses a squeeze-and-excitation module to fuse various structural information. We theoretically prove that the proposed model is expressive in terms of generating distinct representations for nodes with different structural feature. Besides, we empirically validate our model on eight real-world graph data, including social network, e-commerce co-purchase network and citation network, and show that it consistently outperforms strong baselines.

التعلم الآلي الذكاء الاصطناعي

How Does Data Augmentation Affect Privacy in Machine Learning?

291 - Da Yu , Huishuai Zhang , Wei Chen 2020

It is observed in the literature that data augmentation can significantly mitigate membership inference (MI) attack. However, in this work, we challenge this observation by proposing new MI attacks to utilize the information of augmented data. MI att ack is widely used to measure the models information leakage of the training set. We establish the optimal membership inference when the model is trained with augmented data, which inspires us to formulate the MI attack as a set classification problem, i.e., classifying a set of augmented instances instead of a single data point, and design input permutation invariant features. Empirically, we demonstrate that the proposed approach universally outperforms original methods when the model is trained with data augmentation. Even further, we show that the proposed approach can achieve higher MI attack success rates on models trained with some data augmentation than the existing methods on models trained without data augmentation. Notably, we achieve a 70.1% MI attack success rate on CIFAR10 against a wide residual network while the previous best approach only attains 61.9%. This suggests the privacy risk of models trained with data augmentation could be largely underestimated.

التعلم الآلي التعلم الالي

Neural Network Branching for Neural Network Verification

97 - Jingyue Lu , M. Pawan Kumar 2019

Formal verification of neural networks is essential for their deployment in safety-critical areas. Many available formal verification methods have been shown to be instances of a unified Branch and Bound (BaB) formulation. We propose a novel framewor k for designing an effective branching strategy for BaB. Specifically, we learn a graph neural network (GNN) to imitate the strong branching heuristic behaviour. Our framework differs from previous methods for learning to branch in two main aspects. Firstly, our framework directly treats the neural network we want to verify as a graph input for the GNN. Secondly, we develop an intuitive forward and backward embedding update schedule. Empirically, our framework achieves roughly $50%$ reduction in both the number of branches and the time required for verification on various convolutional networks when compared to the best available hand-designed branching strategy. In addition, we show that our GNN model enjoys both horizontal and vertical transferability. Horizontally, the model trained on easy properties performs well on properties of increased difficulty levels. Vertically, the model trained on small neural networks achieves similar performance on large neural networks.

التعلم الآلي التعلم الالي