A Primal Condition for Approachability with Partial Monitoring

478 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Gilles Stoltz

تاريخ النشر 2013

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Shie Mannor - Gilles Stoltzn (INRIA Paris - Rocquencourt

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In approachability with full monitoring there are two types of conditions that are known to be equivalent for convex sets: a primal and a dual condition. The primal one is of the form: a set C is approachable if and only all containing half-spaces are approachable in the one-shot game; while the dual one is of the form: a convex set C is approachable if and only if it intersects all payoff sets of a certain form. We consider approachability in games with partial monitoring. In previous works (Perchet 2011; Mannor et al. 2011) we provided a dual characterization of approachable convex sets; we also exhibited efficient strategies in the case where C is a polytope. In this paper we provide primal conditions on a convex set to be approachable with partial monitoring. They depend on a modified reward function and lead to approachability strategies, based on modified payoff functions, that proceed by projections similarly to Blackwells (1956) strategy; this is in contrast with previously studied strategies in this context that relied mostly on the signaling structure and aimed at estimating well the distributions of the signals received. Our results generalize classical results by Kohlberg 1975 (see also Mertens et al. 1994) and apply to games with arbitrary signaling structure as well as to arbitrary convex sets.

قيم البحث

128 - Xiaodong Cheng , Shengling Shi , Ioannis Lestas 2021

This paper considers dynamic networks where vertices and edges represent manifest signals and causal dependencies among the signals, respectively. We address the problem of how to determine if the dynamics of a network can be identified when only par tial vertices are measured and excited. A necessary condition for network identifiability is presented, where the analysis is performed based on identifying the dependency of a set of rational functions from excited vertices to measured ones. This condition is further characterised by using an edge-removal procedure on the associated bipartite graph. Moreover, on the basis of necessity analysis, we provide a necessary and sufficient condition for identifiability in circular networks.

التحسين والتحكم

Exponential Stability of Partial Primal-Dual Gradient Dynamics with Nonsmooth Objective Functions

79 - Zhaojian Wang , Wei Wei , Changhong Zhao 2020

In this paper, we investigate the continuous time partial primal-dual gradient dynamics (P-PDGD) for solving convex optimization problems with the form $ minlimits_{xin X,yinOmega} f({x})+h(y), textit{s.t.} A{x}+By=C $, where $ f({x}) $ is strongly c onvex and smooth, but $ h(y) $ is strongly convex and non-smooth. Affine equality and set constraints are included. We prove the exponential stability of P-PDGD, and bounds on decaying rates are provided. Moreover, it is also shown that the decaying rates can be regulated by setting the stepsize.

التحسين والتحكم

Optimal preventive maintenance scheduling for wind turbines under condition monitoring

63 - Quanjiang Yu , Pramod Bangalore , Sara Fogelstrom 2021

We suggest a mathematical model for computing and regularly updating the next preventive maintenance plan for a wind farm. Our optimization criterium takes into account the current ages of the key components, the major maintenance costs including eve ntual energy production losses as well as the available data monitoring the condition of the wind turbines. We illustrate our approach with a case study based on data collected from several wind farms located in Sweden. Our results show that preventive maintenance planning gives some effect, if the wind turbine components in question live significantly shorter than the turbine itself.

التحسين والتحكم

A PDE Approach to the Prediction of a Binary Sequence with Advice from Two History-Dependent Experts

152 - Nadejda Drenska , Robert V. Kohn 2020

The prediction of a binary sequence is a classic example of online machine learning. We like to call it the stock prediction problem, viewing the sequence as the price history of a stock that goes up or down one unit at each time step. In this proble m, an investor has access to the predictions of two or more experts, and strives to minimize her final-time regret with respect to the best-performing expert. Probability plays no role; rather, the market is assumed to be adversarial. We consider the case when there are two history-dependent experts, whose predictions are determined by the d most recent stock moves. Focusing on an appropriate continuum limit and using methods from optimal control, graph theory, and partial differential equations, we discuss strategies for the investor and the adversarial market, and we determine associated upper and lower bounds for the investors final-time regret. When d is less than 4 our upper and lower bounds coalesce, so the proposed strategies are asymptotically optimal. Compared to other recent applications of partial differential equations to prediction, ours has a new element: there are two timescales, since the recent history changes at every step whereas regret accumulates more slowly.

التحسين والتحكم علوم الكمبيوتر ونظرية الألعاب التعلم الآلي

Convergence of gradient descent-ascent analyzed as a Newtonian dynamical system with dissipation

79 - H. Sebastian Seung 2019

A dynamical system is defined in terms of the gradient of a payoff function. Dynamical variables are of two types, ascent and descent. The ascent variables move in the direction of the gradient, while the descent variables move in the opposite direct ion. Dynamical systems of this form or very similar forms have been studied in diverse fields such as game theory, optimization, neural networks, and population biology. Gradient descent-ascent is approximated as a Newtonian dynamical system that conserves total energy, defined as the sum of the kinetic energy and a potential energy that is proportional to the payoff function. The error of the approximation is a residual force that violates energy conservation. If the residual force is purely dissipative, then the energy serves as a Lyapunov function, and convergence of bounded trajectories to steady states is guaranteed. A previous convergence theorem due to Kose and Uzawa required the payoff function to be convex in the descent variables, and concave in the ascent variables. Here the assumption is relaxed, so that the payoff function need only be globally `less convex or `more concave in the ascent variables than in the descent variables. Such relative convexity conditions allow the existence of multiple steady states, unlike the convex-concave assumption. When combined with sufficient conditions that imply the existence of a minimax equilibrium, boundedness of trajectories is also assured.

التحسين والتحكم علوم الكمبيوتر ونظرية الألعاب التعلم الآلي