ترغب بنشر مسار تعليمي؟ اضغط هنا

Regret Analysis of Global Optimization in Univariate Functions with Lipschitz Derivatives

127   0   0.0 ( 0 )
 نشر من قبل Kaan Gokcesu
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

In this work, we study the problem of global optimization in univariate loss functions, where we analyze the regret of the popular lower bounding algorithms (e.g., Piyavskii-Shubert algorithm). For any given time $T$, instead of the widely available simple regret (which is the difference of the losses between the best estimation up to $T$ and the global optimizer), we study the cumulative regret up to that time. With a suitable lower bounding algorithm, we show that it is possible to achieve satisfactory cumulative regret bounds for different classes of functions. For Lipschitz continuous functions with the parameter $L$, we show that the cumulative regret is $O(Llog T)$. For Lipschitz smooth functions with the parameter $H$, we show that the cumulative regret is $O(H)$. We also analytically extend our results for a broader class of functions that covers both the Lipschitz continuous and smooth functions individually.



قيم البحث

اقرأ أيضاً

68 - Tor Lattimore 2021
We analyse adversarial bandit convex optimisation with an adversary that is restricted to playing functions of the form $f_t(x) = g_t(langle x, thetarangle)$ for convex $g_t : mathbb R to mathbb R$ and unknown $theta in mathbb R^d$ that is homogeneou s over time. We provide a short information-theoretic proof that the minimax regret is at most $O(d sqrt{n} log(n operatorname{diam}(mathcal K)))$ where $n$ is the number of interactions, $d$ the dimension and $operatorname{diam}(mathcal K)$ is the diameter of the constraint set.
In this paper, the problem of safe global maximization (it should not be confused with robust optimization) of expensive noisy black-box functions satisfying the Lipschitz condition is considered. The notion safe means that the objective function $f( x)$ during optimization should not violate a safety threshold, for instance, a certain a priori given value $h$ in a maximization problem. Thus, any new function evaluation (possibly corrupted by noise) must be performed at safe points only, namely, at points $y$ for which it is known that the objective function $f(y) > h$. The main difficulty here consists in the fact that the used optimization algorithm should ensure that the safety constraint will be satisfied at a point $y$ before evaluation of $f(y)$ will be executed. Thus, it is required both to determine the safe region $Omega$ within the search domain~$D$ and to find the global maximum within $Omega$. An additional difficulty consists in the fact that these problems should be solved in the presence of the noise. This paper starts with a theoretical study of the problem and it is shown that even though the objective function $f(x)$ satisfies the Lipschitz condition, traditional Lipschitz minorants and majorants cannot be used due to the presence of the noise. Then, a $delta$-Lipschitz framework and two algorithms using it are proposed to solve the safe global maximization problem. The first method determines the safe area within the search domain and the second one executes the global maximization over the found safe region. For both methods a number of theoretical results related to their functioning and convergence is established. Finally, numerical experiments confirming the reliability of the proposed procedures are performed.
Reinforcement learning (RL) with linear function approximation has received increasing attention recently. However, existing work has focused on obtaining $sqrt{T}$-type regret bound, where $T$ is the number of interactions with the MDP. In this pape r, we show that logarithmic regret is attainable under two recently proposed linear MDP assumptions provided that there exists a positive sub-optimality gap for the optimal action-value function. More specifically, under the linear MDP assumption (Jin et al. 2019), the LSVI-UCB algorithm can achieve $tilde{O}(d^{3}H^5/text{gap}_{text{min}}cdot log(T))$ regret; and under the linear mixture MDP assumption (Ayoub et al. 2020), the UCRL-VTR algorithm can achieve $tilde{O}(d^{2}H^5/text{gap}_{text{min}}cdot log^3(T))$ regret, where $d$ is the dimension of feature mapping, $H$ is the length of episode, $text{gap}_{text{min}}$ is the minimal sub-optimality gap, and $tilde O$ hides all logarithmic terms except $log(T)$. To the best of our knowledge, these are the first logarithmic regret bounds for RL with linear function approximation. We also establish gap-dependent lower bounds for the two linear MDP models.
We study the model-based undiscounted reinforcement learning for partially observable Markov decision processes (POMDPs). The oracle we consider is the optimal policy of the POMDP with a known environment in terms of the average reward over an infini te horizon. We propose a learning algorithm for this problem, building on spectral method-of-moments estimations for hidden Markov models, the belief error control in POMDPs and upper-confidence-bound methods for online learning. We establish a regret bound of $O(T^{2/3}sqrt{log T})$ for the proposed learning algorithm where $T$ is the learning horizon. This is, to the best of our knowledge, the first algorithm achieving sublinear regret with respect to our oracle for learning general POMDPs.
We introduce a new regularization technique, using what we refer to as the Steklov regularization function, and apply this technique to devise an algorithm that computes a global minimizer of univariate coercive functions. First, we show that the Ste klov regularization convexifies a given univariate coercive function. Then, by using the regularization parameter as the independent variable, a trajectory is constructed on the surface generated by the Steklov function. For monic quartic polynomials, we prove that this trajectory does generate a global minimizer. In the process, we derive some properties of quartic polynomials. Comparisons are made with a previous approach which uses a quadratic regularization function. We carry out numerical experiments to illustrate the working of the new method on polynomials of various degree as well as a non-polynomial function.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا