Do you want to publish a course? Click here

Newton-Stein Method: An optimization method for GLMs via Steins Lemma

102   0   0.0 ( 0 )
 Added by Murat A. Erdogdu
 Publication date 2015
and research's language is English




Ask ChatGPT about the research

We consider the problem of efficiently computing the maximum likelihood estimator in Generalized Linear Models (GLMs) when the number of observations is much larger than the number of coefficients ($n gg p gg 1$). In this regime, optimization algorithms can immensely benefit from approximate second order information. We propose an alternative way of constructing the curvature information by formulating it as an estimation problem and applying a Stein-type lemma, which allows further improvements through sub-sampling and eigenvalue thresholding. Our algorithm enjoys fast convergence rates, resembling that of second order methods, with modest per-iteration cost. We provide its convergence analysis for the general case where the rows of the design matrix are samples from a sub-gaussian distribution. We show that the convergence has two phases, a quadratic phase followed by a linear phase. Finally, we empirically demonstrate that our algorithm achieves the highest performance compared to various algorithms on several datasets.



rate research

Read More

320 - Hao Liu , Yihao Feng , Yi Mao 2017
Policy gradient methods have achieved remarkable successes in solving challenging reinforcement learning problems. However, it still often suffers from the large variance issue on policy gradient estimation, which leads to poor sample efficiency during training. In this work, we propose a control variate method to effectively reduce variance for policy gradient methods. Motivated by the Steins identity, our method extends the previous control variate methods used in REINFORCE and advantage actor-critic by introducing more general action-dependent baseline functions. Empirical studies show that our method significantly improves the sample efficiency of the state-of-the-art policy gradient approaches.
86 - Zhi Zeng , Fulei Ma 2020
This paper presents an efficient gradient projection-based method for structural topological optimization problems characterized by a nonlinear objective function which is minimized over a feasible region defined by bilateral bounds and a single linear equality constraint. The specialty of the constraints type, as well as heuristic engineering experiences are exploited to improve the scaling scheme, projection, and searching step. In detail, gradient clipping and a modified projection of searching direction under certain condition are utilized to facilitate the efficiency of the proposed method. Besides, an analytical solution is proposed to approximate this projection with negligible computation and memory costs. Furthermore, the calculation of searching steps is largely simplified. Benchmark problems, including the MBB, the force inverter mechanism, and the 3D cantilever beam are used to validate the effectiveness of the method. The proposed method is implemented in MATLAB which is open-sourced for educational usage.
We present the Variational Adaptive Newton (VAN) method which is a black-box optimization method especially suitable for explorative-learning tasks such as active learning and reinforcement learning. Similar to Bayesian methods, VAN estimates a distribution that can be used for exploration, but requires computations that are similar to continuous optimization methods. Our theoretical contribution reveals that VAN is a second-order method that unifies existing methods in distinct fields of continuous optimization, variational inference, and evolution strategies. Our experimental results show that VAN performs well on a wide-variety of learning tasks. This work presents a general-purpose explorative-learning method that has the potential to improve learning in areas such as active learning and reinforcement learning.
Mini-batch optimal transport (m-OT) has been widely used recently to deal with the memory issue of OT in large-scale applications. Despite their practicality, m-OT suffers from misspecified mappings, namely, mappings that are optimal on the mini-batch level but do not exist in the optimal transportation plan between the original measures. To address the misspecified mappings issue, we propose a novel mini-batch method by using partial optimal transport (POT) between mini-batch empirical measures, which we refer to as mini-batch partial optimal transport (m-POT). Leveraging the insight from the partial transportation, we explain the source of misspecified mappings from the m-OT and motivate why limiting the amount of transported masses among mini-batches via POT can alleviate the incorrect mappings. Finally, we carry out extensive experiments on various applications to compare m-POT with m-OT and recently proposed mini-batch method, mini-batch unbalanced optimal transport (m-UOT). We observe that m-POT is better than m-OT deep domain adaptation applications while having comparable performance with m-UOT. On other applications, such as deep generative model, gradient flow, and color transfer, m-POT yields more favorable performance than both m-OT and m-UOT.
We consider a general preferential attachment model, where the probability that a newly arriving vertex connects to an older vertex is proportional to a sublinear function of the indegree of the older vertex at that time. It is well known that the distribution of a uniformly chosen vertex converges to a limiting distribution. Depending on the parameters, this model can show power law, but also stretched exponential behaviour. Using Steins method we provide rates of convergence for the total variation distance. Our proof uses the fact that the limiting distribution is the stationary distribution of a Markov chain together with the generator method of Barbour.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا