ترغب بنشر مسار تعليمي؟ اضغط هنا

Discretizing Dynamics for Maximum Likelihood Constraint Inference

322   0   0.0 ( 0 )
 نشر من قبل Kaylene Stocking
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Maximum likelihood constraint inference is a powerful technique for identifying unmodeled constraints that affect the behavior of a demonstrator acting under a known objective function. However, it was originally formulated only for discrete state-action spaces. Continuous dynamics are more useful for modeling many real-world systems of interest, including the movements of humans and robots. We present a method to generate a tabular state-action space that approximates continuous dynamics and can be used for constraint inference on demonstrations that obey the true system dynamics. We then demonstrate accurate constraint inference on nonlinear pendulum systems with 2- and 4-dimensional state spaces, and show that performance is robust to a range of hyperparameters. The demonstrations are not required to be fully optimal with respect to the objective, and the most likely constraints can be identified even when demonstrations cover only a small portion of the state space. For these reasons, the proposed approach may be especially useful for inferring constraints on human demonstrators, which has important applications in human-robot interaction and biomechanical medicine.



قيم البحث

اقرأ أيضاً

The Reward-Biased Maximum Likelihood Estimate (RBMLE) for adaptive control of Markov chains was proposed to overcome the central obstacle of what is variously called the fundamental closed-identifiability problem of adaptive control, the dual control problem, or, contemporaneously, the exploration vs. exploitation problem. It exploited the key observation that since the maximum likelihood parameter estimator can asymptotically identify the closed-transition probabilities under a certainty equivalent approach, the limiting parameter estimates must necessarily have an optimal reward that is less than the optimal reward attainable for the true but unknown system. Hence it proposed a counteracting reverse bias in favor of parameters with larger optimal rewards, providing a solution to the fundamental problem alluded to above. It thereby proposed an optimistic approach of favoring parameters with larger optimal rewards, now known as optimism in the face of uncertainty. The RBMLE approach has been proved to be long-term average reward optimal in a variety of contexts. However, modern attention is focused on the much finer notion of regret, or finite-time performance. Recent analysis of RBMLE for multi-armed stochastic bandits and linear contextual bandits has shown that it not only has state-of-the-art regret, but it also exhibits empirical performance comparable to or better than the best current contenders, and leads to strikingly simple index policies. Motivated by this, we examine the finite-time performance of RBMLE for reinforcement learning tasks that involve the general problem of optimal control of unknown Markov Decision Processes. We show that it has a regret of $mathcal{O}( log T)$ over a time horizon of $T$ steps, similar to state-of-the-art algorithms. Simulation studies show that RBMLE outperforms other algorithms such as UCRL2 and Thompson Sampling.
56 - Farzan Memarian , Zhe Xu , Bo Wu 2020
We consider the problem of reward learning for temporally extended tasks. For reward learning, inverse reinforcement learning (IRL) is a widely used paradigm. Given a Markov decision process (MDP) and a set of demonstrations for a task, IRL learns a reward function that assigns a real-valued reward to each state of the MDP. However, for temporally extended tasks, the underlying reward function may not be expressible as a function of individual states of the MDP. Instead, the history of visited states may need to be considered to determine the reward at the current state. To address this issue, we propose an iterative algorithm to learn a reward function for temporally extended tasks. At each iteration, the algorithm alternates between two modules, a task inference module that infers the underlying task structure and a reward learning module that uses the inferred task structure to learn a reward function. The task inference module produces a series of queries, where each query is a sequence of subgoals. The demonstrator provides a binary response to each query by attempting to execute it in the environment and observing the environments feedback. After the queries are answered, the task inference module returns an automaton encoding its current hypothesis of the task structure. The reward learning module augments the state space of the MDP with the states of the automaton. The module then proceeds to learn a reward function over the augmented state space using a novel deep maximum entropy IRL algorithm. This iterative process continues until it learns a reward function with satisfactory performance. The experiments show that the proposed algorithm significantly outperforms several IRL baselines on temporally extended tasks.
132 - Amit Meir , Mathias Drton 2017
Applying standard statistical methods after model selection may yield inefficient estimators and hypothesis tests that fail to achieve nominal type-I error rates. The main issue is the fact that the post-selection distribution of the data differs fro m the original distribution. In particular, the observed data is constrained to lie in a subset of the original sample space that is determined by the selected model. This often makes the post-selection likelihood of the observed data intractable and maximum likelihood inference difficult. In this work, we get around the intractable likelihood by generating noisy unbiased estimates of the post-selection score function and using them in a stochastic ascent algorithm that yields correct post-selection maximum likelihood estimates. We apply the proposed technique to the problem of estimating linear models selected by the lasso. In an asymptotic analysis the resulting estimates are shown to be consistent for the selected parameters and to have a limiting truncated normal distribution. Confidence intervals constructed based on the asymptotic distribution obtain close to nominal coverage rates in all simulation settings considered, and the point estimates are shown to be superior to the lasso estimates when the true model is sparse.
We present a framework for bi-level trajectory optimization in which a systems dynamics are encoded as the solution to a constrained optimization problem and smooth gradients of this lower-level problem are passed to an upper-level trajectory optimiz er. This optimization-based dynamics representation enables constraint handling, additional variables, and non-smooth forces to be abstracted away from the upper-level optimizer, and allows classical unconstrained optimizers to synthesize trajectories for more complex systems. We provide a path-following method for efficient evaluation of constrained dynamics and utilize the implicit-function theorem to compute smooth gradients of this representation. We demonstrate the framework by modeling systems from locomotion, aerospace, and manipulation domains including: acrobot with joint limits, cart-pole subject to Coulomb friction, Raibert hopper, rocket landing with thrust limits, and planar-push task with optimization-based dynamics and then optimize trajectories using iterative LQR.
We derive Laplace-approximated maximum likelihood estimators (GLAMLEs) of parameters in our Graph Generalized Linear Latent Variable Models. Then, we study the statistical properties of GLAMLEs when the number of nodes $n_V$ and the observed times of a graph denoted by $K$ diverge to infinity. Finally, we display the estimation results in a Monte Carlo simulation considering different numbers of latent variables. Besides, we make a comparison between Laplace and variational approximations for inference of our model.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا