ﻻ يوجد ملخص باللغة العربية
In this paper, we explore techniques centered around periodic sampling of model weights that provide convergence improvements on gradient update methods (vanilla acs{SGD}, Momentum, Adam) for a variety of vision problems (classification, detection, segmentation). Importantly, our algorithms provide better, faster and more robust convergence and training performance with only a slight increase in computation time. Our techniques are independent of the neural network model, gradient optimization methods or existing optimal training policies and converge in a less volatile fashion with performance improvements that are approximately monotonic. We conduct a variety of experiments to quantify these improvements and identify scenarios where these techniques could be more useful.
Generative adversarial networks (GAN) have shown remarkable results in image generation tasks. High fidelity class-conditional GAN methods often rely on stabilization techniques by constraining the global Lipschitz continuity. Such regularization lea
The inductive bias of a neural network is largely determined by the architecture and the training algorithm. To achieve good generalization, how to effectively train a neural network is of great importance. We propose a novel orthogonal over-paramete
Neural networks are commonly used as models for classification for a wide variety of tasks. Typically, a learned affine transformation is placed at the end of such models, yielding a per-class value used for classification. This classifier can have a
Sparse deep neural networks have shown their advantages over dense models with fewer parameters and higher computational efficiency. Here we demonstrate constraining the synaptic weights on unit Lp-sphere enables the flexibly control of the sparsity
Sparsification is an efficient approach to accelerate CNN inference, but it is challenging to take advantage of sparsity in training procedure because the involved gradients are dynamically changed. Actually, an important observation shows that most