ترغب بنشر مسار تعليمي؟ اضغط هنا

Controlling Unknown Linear Dynamics with Bounded Multiplicative Regret

79   0   0.0 ( 0 )
 نشر من قبل Jacob Carruth
 تاريخ النشر 2021
  مجال البحث
والبحث باللغة English




اسأل ChatGPT حول البحث

We consider a simple control problem in which the underlying dynamics depend on a parameter that is unknown and must be learned. We exhibit a control strategy which is optimal to within a multiplicative constant. While most authors find strategies which are successful as the time horizon tends to infinity, our strategy achieves lowest expected cost up to a constant factor for a fixed time horizon.



قيم البحث

اقرأ أيضاً

We consider the problem of controlling an unknown linear quadratic Gaussian (LQG) system consisting of multiple subsystems connected over a network. Our goal is to minimize and quantify the regret (i.e. loss in performance) of our strategy with respe ct to an oracle who knows the system model. Viewing the interconnected subsystems globally and directly using existing LQG learning algorithms for the global system results in a regret that increases super-linearly with the number of subsystems. Instead, we propose a new Thompson sampling based learning algorithm which exploits the structure of the underlying network. We show that the expected regret of the proposed algorithm is bounded by $tilde{mathcal{O}} big( n sqrt{T} big)$ where $n$ is the number of subsystems, $T$ is the time horizon and the $tilde{mathcal{O}}(cdot)$ notation hides logarithmic terms in $n$ and $T$. Thus, the regret scales linearly with the number of subsystems. We present numerical experiments to illustrate the salient features of the proposed algorithm.
The k-regret query aims to return a size-k subset S of a database D such that, for any query user that selects a data object from this size-k subset S rather than from database D, her regret ratio is minimized. The regret ratio here is modeled by the relative difference in the optimality between the locally optimal object in S and the globally optimal object in D. The optimality of a data object in turn is modeled by a utility function of the query user. Unlike traditional top-k queries, the k-regret query does not minimize the regret ratio for a specific utility function. Instead, it considers a family of infinite utility functions F, and aims to find a size-k subset that minimizes the maximum regret ratio of any utility function in F. Studies on k-regret queries have focused on the family of additive utility functions, which have limitations in modeling individuals preferences and decision making processes, especially for a common observation called the diminishing marginal rate of substitution (DMRS). We introduce k-regret queries with multiplicative utility functions, which are more expressive in modeling the DMRS, to overcome those limitations. We propose a query algorithm with bounded regret ratios. To showcase the applicability of the algorithm, we apply it to a special family of multiplicative utility functions, the Cobb-Douglas family of utility functions, and a closely related family of utility functions, the Constant Elasticity of Substitution family of utility functions, both of which are frequently used utility functions in microeconomics. After a further study of the query properties, we propose a heuristic algorithm that produces even smaller regret ratios in practice. Extensive experiments on the proposed algorithms confirm that they consistently achieve small maximum regret ratios.
We study predictive control in a setting where the dynamics are time-varying and linear, and the costs are time-varying and well-conditioned. At each time step, the controller receives the exact predictions of costs, dynamics, and disturbances for th e future $k$ time steps. We show that when the prediction window $k$ is sufficiently large, predictive control is input-to-state stable and achieves a dynamic regret of $O(lambda^k T)$, where $lambda < 1$ is a positive constant. This is the first dynamic regret bound on the predictive control of linear time-varying systems. Under more assumptions on the terminal costs, we also show that predictive control obtains the first competitive bound for the control of linear time-varying systems: $1 + O(lambda^k)$. Our results are derived using a novel proof framework based on a perturbation bound that characterizes how a small change to the system parameters impacts the optimal trajectory.
60 - Fang Liu 2021
It is common to encounter the situation with uncertainty for decision makers (DMs) in dealing with a complex decision making problem. The existing evidence shows that people usually fear the extreme uncertainty named as the unknown. This paper report s the modified version of the typical regret theory by considering the fear experienced by DMs for the unknown. Based on the responses of undergraduate students to the hypothetical choice problems with an unknown outcome, some experimental evidences are observed and analyzed. The framework of the modified regret theory is established by considering the effects of an unknown outcome. A fear function is equipped and some implications are proved. The behavioral foundation of the modified regret theory is further developed by modifying the axiomatic properties of the existing one as those based on the utility function; and it is recalled as the utility-based behavioral foundation for convenience. The application to the medical decision making with an unknown risk is studied and the effects of the fear function are investigated. The observations reveal that the existence of an unknown outcome could enhance, impede or reverse the preference relation of people in a choice problem, which can be predicted by the developed regret theory.
52 - G.Guatteri , G.Tessitore 2019
In this paper we study an Ergodic Markovian BSDE involving a forward process $X$ that solves an infinite dimensional forward stochastic evolution equation with multiplicative and possibly degenerate diffusion coefficient. A concavity assumption on th e driver allows us to avoid the typical quantitative conditions relating the dissipativity of the forward equation and the Lipschitz constant of the driver. Although the degeneracy of the noise has to be of a suitable type we can give a stochastic representation of a large class of Ergodic HJB equations; morever our general results can be applied to get the synthesis of the optimal feedback law in relevant examples of ergodic control problems for SPDEs.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا