No Arabic abstract
We present and mathematically analyze an online adjoint algorithm for the optimization of partial differential equations (PDEs). Traditional adjoint algorithms would typically solve a new adjoint PDE at each optimization iteration, which can be computationally costly. In contrast, an online adjoint algorithm updates the design variables in continuous-time and thus constantly makes progress towards minimizing the objective function. The online adjoint algorithm we consider is similar in spirit to the pseudo-time-stepping, one-shot method which has been previously proposed. Motivated by the application of such methods to engineering problems, we mathematically study the convergence of the online adjoint algorithm. The online adjoint algorithm relies upon a time-relaxed adjoint PDE which provides an estimate of the direction of steepest descent. The algorithm updates this estimate continuously in time, and it asymptotically converges to the exact direction of steepest descent as $t rightarrow infty$. We rigorously prove that the online adjoint algorithm converges to a critical point of the objective function for optimizing the PDE. Under appropriate technical conditions, we also prove a convergence rate for the algorithm. A crucial step in the convergence proof is a multi-scale analysis of the coupled system for the forward PDE, adjoint PDE, and the gradient descent ODE for the design variables.
In this paper, optimal actuator shape for nonlinear parabolic systems is discussed. The system under study is an abstract differential equation with a locally Lipschitz nonlinear part. A quadratic cost on the state and input of the system is considered. The existence of an optimal actuator shape has been established in the literature. This paper focuses on driving the optimality conditions for actuator shapes belonging to a Banach space. The application of the theory to the optimal actuator shape design for railway track model is considered.
We propose and analyze algorithms for distributionally robust optimization of convex losses with conditional value at risk (CVaR) and $chi^2$ divergence uncertainty sets. We prove that our algorithms require a number of gradient evaluations independent of training set size and number of parameters, making them suitable for large-scale applications. For $chi^2$ uncertainty sets these are the first such guarantees in the literature, and for CVaR our guarantees scale linearly in the uncertainty level rather than quadratically as in previous work. We also provide lower bounds proving the worst-case optimality of our algorithms for CVaR and a penalized version of the $chi^2$ problem. Our primary technical contributions are novel bounds on the bias of batch robust risk estimation and the variance of a multilevel Monte Carlo gradient estimator due to [Blanchet & Glynn, 2015]. Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9--36 times more efficient than full-batch methods.
Dynamical systems, for instance in model predictive control, often contain unknown parameters, which must be determined during system operation. Online or on-the-fly parameter identification methods are therefore necessary. The challenge of online methods is that one must continuously estimate parameters as experimental data becomes available. The existing techniques in the context of time-dependent partial differential equations exclude the case where the system depends nonlinearly on the parameters.Based on a model reference adaptive system approach, we present an online parameter identification method for nonlinear infinite-dimensional evolutionary system.
Due to its simplicity and outstanding ability to generalize, stochastic gradient descent (SGD) is still the most widely used optimization method despite its slow convergence. Meanwhile, adaptive methods have attracted rising attention of optimization and machine learning communities, both for the leverage of life-long information and for the profound and fundamental mathematical theory. Taking the best of both worlds is the most exciting and challenging question in the field of optimization for machine learning. Along this line, we revisited existing adaptive gradient methods from a novel perspective, refreshing understanding of second moments. Our new perspective empowers us to attach the properties of second moments to the first moment iteration, and to propose a novel first moment optimizer, emph{Angle-Calibrated Moment method} (method). Our theoretical results show that method is able to achieve the same convergence rate as mainstream adaptive methods. Furthermore, extensive experiments on CV and NLP tasks demonstrate that method has a comparable convergence to SOTA Adam-type optimizers, and gains a better generalization performance in most cases.
We propose a new class of rigorous methods for derivative-free optimization with the aim of delivering efficient and robust numerical performance for functions of all types, from smooth to non-smooth, and under different noise regimes. To this end, we have developed Full-Low Evaluation methods, organized around two main types of iterations. The first iteration type is expensive in function evaluations, but exhibits good performance in the smooth and non-noisy cases. For the theory, we consider a line search based on an approximate gradient, backtracking until a sufficient decrease condition is satisfied. In practice, the gradient was approximated via finite differences, and the direction was calculated by a quasi-Newton step (BFGS). The second iteration type is cheap in function evaluations, yet more robust in the presence of noise or non-smoothness. For the theory, we consider direct search, and in practice we use probabilistic direct search with one random direction and its negative. A switch condition from Full-Eval to Low-Eval iterations was developed based on the values of the line-search and direct-search stepsizes. If enough Full-Eval steps are taken, we derive a complexity result of gradient-descent type. Under failure of Full-Eval, the Low-Eval iterations become the drivers of convergence yielding non-smooth convergence. Full-Low Evaluation methods are shown to be efficient and robust in practice across problems with different levels of smoothness and noise.