ترغب بنشر مسار تعليمي؟ اضغط هنا

Asymptotic Randomised Control with applications to bandits

101   0   0.0 ( 0 )
 نشر من قبل Tanut Treetanthiploet
 تاريخ النشر 2020
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

We consider a general multi-armed bandit problem with correlated (and simple contextual and restless) elements, as a relaxed control problem. By introducing an entropy premium, we obtain a smooth asymptotic approximation to the value function. This yields a novel semi-index approximation of the optimal decision process, obtained numerically by solving a fixed point problem, which can be interpreted as explicitly balancing an exploration-exploitation trade-off. Performance of the resulting Asymptotic Randomised Control (ARC) algorithm compares favourably with other approaches to correlated multi-armed bandits.



قيم البحث

اقرأ أيضاً

This paper considers a distributed PI-controller for networked dynamical systems. Sufficient conditions for when the controller is able to stabilize a general linear system and eliminate static control errors are presented. The proposed controller is applied to frequency control of power transmission systems. Sufficient stability criteria are derived, and it is shown that the controller parameters can always be chosen so that the frequencies in the closed loop converge to nominal operational frequency. We show that the load sharing property of the generators is maintained, i.e., the input power of the generators is proportional to a controller parameter. The controller is evaluated by simulation on the IEEE 30 bus test network, where its effectiveness is demonstrated.
143 - Pio Ong , Jorge Cortes 2021
This paper proposes a novel framework for resource-aware control design termed performance-barrier-based triggering. Given a feedback policy, along with a Lyapunov function certificate that guarantees its correctness, we examine the problem of design ing its digital implementation through event-triggered control while ensuring a prescribed performance is met and triggers occur as sparingly as possible. Our methodology takes into account the performance residual, i.e., how well the system is doing in regards to the prescribed performance. Inspired by the notion of control barrier function, the trigger design allows the certificate to deviate from monotonically decreasing, with leeway specified as an increasing function of the performance residual, resulting in greater flexibility in prescribing update times. We study different types of performance specifications, with particular attention to quantifying the benefits of the proposed approach in the exponential case. We build on this to design intrinsically Zeno-free distributed triggers for network systems. A comparison of event-triggered approaches in a vehicle platooning problem shows how the proposed design meets the prescribed performance with a significantly lower number of controller updates.
55 - Vincent Andrieu 2020
A nonlinear control system is said to be weakly contractive in the control if the flow that it generates is non-expanding (in the sense that the distance between two trajectories is a non-increasing function of time) for some fixed Riemannian metric independent of the control. We prove in this paper that for such systems, local asymptotic stabilizability implies global asymptotic stabilizability by means of a dynamic state feedback. We link this result and the so-called Jurdjevic and Quinn approach.
We study the asymptotic behaviour of a class of small-noise diffusions driven by fractional Brownian motion, with random starting points. Different scalings allow for different asymptotic properties of the process (small-time and tail behaviours in p articular). In order to do so, we extend some results on sample path large deviations for such diffusions. As an application, we show how these results characterise the small-time and tail estimates of the implied volatility for rough volatility models, recently proposed in mathematical finance.
We consider non-convex stochastic optimization problems where the objective functions have super-linearly growing and discontinuous stochastic gradients. In such a setting, we provide a non-asymptotic analysis for the tamed unadjusted stochastic Lang evin algorithm (TUSLA) introduced in Lovas et al. (2021). In particular, we establish non-asymptotic error bounds for the TUSLA algorithm in Wasserstein-1 and Wasserstein-2 distances. The latter result enables us to further derive non-asymptotic estimates for the expected excess risk. To illustrate the applicability of the main results, we consider an example from transfer learning with ReLU neural networks, which represents a key paradigm in machine learning. Numerical experiments are presented for the aforementioned example which supports our theoretical findings. Hence, in this setting, we demonstrate both theoretically and numerically that the TUSLA algorithm can solve the optimization problem involving neural networks with ReLU activation function. Besides, we provide simulation results for synthetic examples where popular algorithms, e.g. ADAM, AMSGrad, RMSProp, and (vanilla) SGD, may fail to find the minimizer of the objective functions due to the super-linear growth and the discontinuity of the corresponding stochastic gradient, while the TUSLA algorithm converges rapidly to the optimal solution.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا