ﻻ يوجد ملخص باللغة العربية
The simplicity of gradient descent (GD) made it the default method for training ever-deeper and complex neural networks. Both loss functions and architectures are often explicitly tuned to be amenable to this basic local optimization. In the context of weakly-supervised CNN segmentation, we demonstrate a well-motivated loss function where an alternative optimizer (ADM) achieves the state-of-the-art while GD performs poorly. Interestingly, GD obtains its best result for a smoother tuning of the loss function. The results are consistent across different network architectures. Our loss is motivated by well-understood MRF/CRF regularization models in shallow segmentation and their known global solvers. Our work suggests that network design/training should pay more attention to optimization methods.
Non-convex optimization problems are challenging to solve; the success and computational expense of a gradient descent algorithm or variant depend heavily on the initialization strategy. Often, either random initialization is used or initialization r
Stein variational gradient descent (SVGD) and its variants have shown promising successes in approximate inference for complex distributions. However, their empirical performance depends crucially on the choice of optimal kernel. Unfortunately, RBF k
Working with any gradient-based machine learning algorithm involves the tedious task of tuning the optimizers hyperparameters, such as the learning rate. There exist many techniques for automated hyperparameter optimization, but they typically introd
We study the generalization properties of the popular stochastic optimization method known as stochastic gradient descent (SGD) for optimizing general non-convex loss functions. Our main contribution is providing upper bounds on the generalization er
Minimization of regularized losses is a principled approach to weak supervision well-established in deep learning, in general. However, it is largely overlooked in semantic segmentation currently dominated by methods mimicking full supervision via fa