Let $alpha, beta in (0,1)$ such that at least one of them is irrational. We take a random walk on the real line such that the choice of $alpha$ and $beta$ has equal probability $1/2$. We prove that almost surely the $alphabeta$-orbit is uniformly distributed module one, and the exponential sums along its orbit has the square root cancellation. We also show that the exceptional set in the probability space, which does not have the property of uniform distribution modulo one, is large in the terms of topology and Hausdorff dimension.
This paper focuses on the convergence problem of the emerging fractional order gradient descent method, and proposes three solutions to overcome the problem. In fact, the general fractional gradient method cannot converge to the real extreme point of the target function, which critically hampers the application of this method. Because of the long memory characteristics of fractional derivative, fixed memory principle is a prior choice. Apart from the truncation of memory length, two new methods are developed to reach the convergence. The one is the truncation of the infinite series, and the other is the modification of the constant fractional order. Finally, six illustrative examples are performed to illustrate the effectiveness and practicability of proposed methods.
The calculus of variations is a field of mathematical analysis born in 1687 with Newtons problem of minimal resistance, which is concerned with the maxima or minima of integral functionals. Finding the solution of such problems leads to solving the associated Euler-Lagrange equations. The subject has found many applications over the centuries, e.g., in physics, economics, engineering and biology. Up to this moment, however, the theory of the calculus of variations has been confined to Newtons approach to calculus. As in many applications negative values of admissible functions are not physically plausible, we propose here to develop an alternative calculus of variations based on the non-Newtonian approach first introduced by Grossman and Katz in the period between 1967 and 1970, which provides a calculus defined, from the very beginning, for positive real numbers only, and it is based on a (non-Newtonian) derivative that permits one to compare relative changes between a dependent positive variable and an independent variable that is also positive. In this way, the non-Newtonian calculus of variations we introduce here provides a natural framework for problems involving functions with positive images. Our main result is a first-order optimality condition of Euler-Lagrange type. The new calculus of variations complements the standard one in a nontrivial/multiplicative way, guaranteeing that the solution remains in the physically admissible positive range. An illustrative example is given.
We propose to optimize neural networks with a uniformly-distributed random learning rate. The associated stochastic gradient descent algorithm can be approximated by continuous stochastic equations and analyzed within the Fokker-Planck formalism. In the small learning rate regime, the training process is characterized by an effective temperature which depends on the average learning rate, the mini-batch size and the momentum of the optimization algorithm. By comparing the random learning rate protocol with cyclic and constant protocols, we suggest that the random choice is generically the best strategy in the small learning rate regime, yielding better regularization without extra computational cost. We provide supporting evidence through experiments on both shallow, fully-connected and deep, convolutional neural networks for image classification on the MNIST and CIFAR10 datasets.
The need for fast and robust optimization algorithms are of critical importance in all areas of machine learning. This paper treats the task of designing optimization algorithms as an optimal control problem. Using regret as a metric for an algorithms performance, we study the existence, uniqueness and consistency of regret-optimal algorithms. By providing first-order optimality conditions for the control problem, we show that regret-optimal algorithms must satisfy a specific structure in their dynamics which we show is equivalent to performing dual-preconditioned gradient descent on the value function generated by its regret. Using these optimal dynamics, we provide bounds on their rates of convergence to solutions of convex optimization problems. Though closed-form optimal dynamics cannot be obtained in general, we present fast numerical methods for approximating them, generating optimization algorithms which directly optimize their long-term regret. Lastly, these are benchmarked against commonly used optimization algorithms to demonstrate their effectiveness.