ترغب بنشر مسار تعليمي؟ اضغط هنا

Nonparametric Infinite Horizon Kullback-Leibler Stochastic Control

196   0   0.0 ( 0 )
 نشر من قبل Yunpeng Pan
 تاريخ النشر 2014
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We present two nonparametric approaches to Kullback-Leibler (KL) control, or linearly-solvable Markov decision problem (LMDP) based on Gaussian processes (GP) and Nystr{o}m approximation. Compared to recently developed parametric methods, the proposed data-driven frameworks feature accurate function approximation and efficient on-line operations. Theoretically, we derive the mathematical connection of KL control based on dynamic programming with earlier work in control theory which relies on information theoretic dualities for the infinite time horizon case. Algorithmically, we give explicit optimal control policies in nonparametric forms, and propose on-line update schemes with budgeted computational costs. Numerical results demonstrate the effectiveness and usefulness of the proposed frameworks.



قيم البحث

اقرأ أيضاً

Bayesian nonparametric statistics is an area of considerable research interest. While recently there has been an extensive concentration in developing Bayesian nonparametric procedures for model checking, the use of the Dirichlet process, in its simp lest form, along with the Kullback-Leibler divergence is still an open problem. This is mainly attributed to the discreteness property of the Dirichlet process and that the Kullback-Leibler divergence between any discrete distribution and any continuous distribution is infinity. The approach proposed in this paper, which is based on incorporating the Dirichlet process, the Kullback-Leibler divergence and the relative belief ratio, is considered the first concrete solution to this issue. Applying the approach is simple and does not require obtaining a closed form of the relative belief ratio. A Monte Carlo study and real data examples show that the developed approach exhibits excellent performance.
A stochastic model predictive control (SMPC) approach is presented for discrete-time linear systems with arbitrary time-invariant probabilistic uncertainties and additive Gaussian process noise. Closed-loop stability of the SMPC approach is establish ed by appropriate selection of the cost function. Polynomial chaos is used for uncertainty propagation through system dynamics. The performance of the SMPC approach is demonstrated using the Van de Vusse reactions.
Jointly optimal transmission power control and remote estimation over an infinite horizon is studied. A sensor observes a dynamic process and sends its observations to a remote estimator over a wireless fading channel characterized by a time-homogene ous Markov chain. The successful transmission probability depends on both the channel gains and the transmission power used by the sensor. The transmission power control rule and the remote estimator should be jointly designed, aiming to minimize an infinite-horizon cost consisting of the power usage and the remote estimation error. A first question one may ask is: Does this joint optimization problem have a solution? We formulate the joint optimization problem as an average cost belief-state Markov decision process and answer the question by proving that there exists an optimal deterministic and stationary policy. We then show that when the monitored dynamic process is scalar, the optimal remote estimates depend only on the most recently received sensor observation, and the optimal transmission power is symmetric and monotonically increasing with respect to the innovation error.
Renyi divergence is related to Renyi entropy much like Kullback-Leibler divergence is related to Shannons entropy, and comes up in many settings. It was introduced by Renyi as a measure of information that satisfies almost the same axioms as Kullback -Leibler divergence, and depends on a parameter that is called its order. In particular, the Renyi divergence of order 1 equals the Kullback-Leibler divergence. We review and extend the most important properties of Renyi divergence and Kullback-Leibler divergence, including convexity, continuity, limits of $sigma$-algebras and the relation of the special order 0 to the Gaussian dichotomy and contiguity. We also show how to generalize the Pythagorean inequality to orders different from 1, and we extend the known equivalence between channel capacity and minimax redundancy to continuous channel inputs (for all orders) and present several other minimax results.
We propose a method to fuse posterior distributions learned from heterogeneous datasets. Our algorithm relies on a mean field assumption for both the fused model and the individual dataset posteriors and proceeds using a simple assign-and-average app roach. The components of the dataset posteriors are assigned to the proposed global model components by solving a regularized variant of the assignment problem. The global components are then updated based on these assignments by their mean under a KL divergence. For exponential family variational distributions, our formulation leads to an efficient non-parametric algorithm for computing the fused model. Our algorithm is easy to describe and implement, efficient, and competitive with state-of-the-art on motion capture analysis, topic modeling, and federated learning of Bayesian neural networks.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا