Do you want to publish a course? Click here

Every Local Minimum Value is the Global Minimum Value of Induced Model in Non-convex Machine Learning

141   0   0.0 ( 0 )
 Added by Kenji Kawaguchi
 Publication date 2019
and research's language is English




Ask ChatGPT about the research

For nonconvex optimization in machine learning, this article proves that every local minimum achieves the globally optimal value of the perturbable gradient basis model at any differentiable point. As a result, nonconvex machine learning is theoretically as supported as convex machine learning with a handcrafted basis in terms of the loss at differentiable local minima, except in the case when a preference is given to the handcrafted basis over the perturbable gradient basis. The proofs of these results are derived under mild assumptions. Accordingly, the proven results are directly applicable to many machine learning models, including practical deep neural networks, without any modification of practical methods. Furthermore, as special cases of our general results, this article improves or complements several state-of-the-art theoretical results on deep neural networks, deep residual networks, and overparameterized deep neural networks with a unified proof technique and novel geometric insights. A special case of our results also contributes to the theoretical foundation of representation learning.



rate research

Read More

We introduce a novel Deep Reinforcement Learning (DRL) algorithm called Deep Quality-Value (DQV) Learning. DQV uses temporal-difference learning to train a Value neural network and uses this network for training a second Quality-value network that learns to estimate state-action values. We first test DQVs update rules with Multilayer Perceptrons as function approximators on two classic RL problems, and then extend DQV with the use of Deep Convolutional Neural Networks, `Experience Replay and `Target Neural Networks for tackling four games of the Atari Arcade Learning environment. Our results show that DQV learns significantly faster and better than Deep Q-Learning and Double Deep Q-Learning, suggesting that our algorithm can potentially be a better performing synchronous temporal difference algorithm than what is currently present in DRL.
In this paper, we propose texttt{FedGLOMO}, the first (first-order) FL algorithm that achieves the optimal iteration complexity (i.e matching the known lower bound) on smooth non-convex objectives -- without using clients full gradient in each round. Our key algorithmic idea that enables attaining this optimal complexity is applying judicious momentum terms that promote variance reduction in both the local updates at the clients, and the global update at the server. Our algorithm is also provably optimal even with compressed communication between the clients and the server, which is an important consideration in the practical deployment of FL algorithms. Our experiments illustrate the intrinsic variance reduction effect of texttt{FedGLOMO} which implicitly suppresses client-drift in heterogeneous data distribution settings and promotes communication-efficiency. As a prequel to texttt{FedGLOMO}, we propose texttt{FedLOMO} which applies momentum only in the local client updates. We establish that texttt{FedLOMO} enjoys improved convergence rates under common non-convex settings compared to prior work, and with fewer assumptions.
In machine learning applications for online product offerings and marketing strategies, there are often hundreds or thousands of features available to build such models. Feature selection is one essential method in such applications for multiple objectives: improving the prediction accuracy by eliminating irrelevant features, accelerating the model training and prediction speed, reducing the monitoring and maintenance workload for feature data pipeline, and providing better model interpretation and diagnosis capability. However, selecting an optimal feature subset from a large feature space is considered as an NP-complete problem. The mRMR (Minimum Redundancy and Maximum Relevance) feature selection framework solves this problem by selecting the relevant features while controlling for the redundancy within the selected features. This paper describes the approach to extend, evaluate, and implement the mRMR feature selection methods for classification problem in a marketing machine learning platform at Uber that automates creation and deployment of targeting and personalization models at scale. This study first extends the existing mRMR methods by introducing a non-linear feature redundancy measure and a model-based feature relevance measure. Then an extensive empirical evaluation is performed for eight different feature selection methods, using one synthetic dataset and three real-world marketing datasets at Uber to cover different use cases. Based on the empirical results, the selected mRMR method is implemented in production for the marketing machine learning platform. A description of the production implementation is provided and an online experiment deployed through the platform is discussed.
An Euler discretization of the Langevin diffusion is known to converge to the global minimizers of certain convex and non-convex optimization problems. We show that this property holds for any suitably smooth diffusion and that different diffusions are suitable for optimizing different classes of convex and non-convex functions. This allows us to design diffusions suitable for globally optimizing convex and non-convex functions not covered by the existing Langevin theory. Our non-asymptotic analysis delivers computable optimization and integration error bounds based on easily accessed properties of the objective and chosen diffusion. Central to our approach are new explicit Stein factor bounds on the solutions of Poisson equations. We complement these results with improved optimization guarantees for targets other than the standard Gibbs measure.
259 - K.Ito , T.Takeuchi 2009
The minimum value function appearing in Tikhonov regularization technique is very useful in determining the regularization parameter, both theoretically and numerically. In this paper, we discuss the properties of the minimum value function. We also propose an efficient method to determine the regularization parameter. A new criterion for the determination of the regularization parameter is also discussed.

suggested questions

comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا