ﻻ يوجد ملخص باللغة العربية
Improving sparsity of deep neural networks (DNNs) is essential for network compression and has drawn much attention. In this work, we disclose a harmful sparsifying process called filter collapse, which is common in DNNs with batch normalization (BN) and rectified linear activation functions (e.g. ReLU, Leaky ReLU). It occurs even without explicit sparsity-inducing regularizations such as $L_1$. This phenomenon is caused by the normalization effect of BN, which induces a non-trainable region in the parameter space and reduces the network capacity as a result. This phenomenon becomes more prominent when the network is trained with large learning rates (LR) or adaptive LR schedulers, and when the network is finetuned. We analytically prove that the parameters of BN tend to become sparser during SGD updates with high gradient noise and that the sparsifying probability is proportional to the square of learning rate and inversely proportional to the square of the scale parameter of BN. To prevent the undesirable collapsed filters, we propose a simple yet effective approach named post-shifted BN (psBN), which has the same representation ability as BN while being able to automatically make BN parameters trainable again as they saturate during training. With psBN, we can recover collapsed filters and increase the model performance in various tasks such as classification on CIFAR-10 and object detection on MS-COCO2017.
Mixup is a popular data augmentation technique based on taking convex combinations of pairs of examples and their labels. This simple technique has been shown to substantially improve both the robustness and the generalization of the trained model. H
Weight sharing, as an approach to speed up architecture performance estimation has received wide attention. Instead of training each architecture separately, weight sharing builds a supernet that assembles all the architectures as its submodels. Howe
Graph neural network (GNN) has recently been established as an effective representation learning framework on graph data. However, the popular message passing models rely on local permutation invariant aggregate functions, which gives rise to the con
It is observed in the literature that data augmentation can significantly mitigate membership inference (MI) attack. However, in this work, we challenge this observation by proposing new MI attacks to utilize the information of augmented data. MI att
Formal verification of neural networks is essential for their deployment in safety-critical areas. Many available formal verification methods have been shown to be instances of a unified Branch and Bound (BaB) formulation. We propose a novel framewor