ﻻ يوجد ملخص باللغة العربية
Temporal observations such as videos contain essential information about the dynamics of the underlying scene, but they are often interleaved with inessential, predictable details. One way of dealing with this problem is by focusing on the most informative moments in a sequence. We propose a model that learns to discover these important events and the times when they occur and uses them to represent the full sequence. We do so using a hierarchical Keyframe-Inpainter (KeyIn) model that first generates a videos keyframes and then inpaints the rest by generating the frames at the intervening times. We propose a fully differentiable formulation to efficiently learn this procedure. We show that KeyIn finds informative keyframes in several datasets with different dynamics and visual properties. KeyIn outperforms other recent hierarchical predictive models for planning. For more details, please see the project website at url{https://sites.google.com/view/keyin}.
When humans observe a physical system, they can easily locate objects, understand their interactions, and anticipate future behavior, even in settings with complicated and previously unseen interactions. For computers, however, learning such models f
Imitation learning trains control policies by mimicking pre-recorded expert demonstrations. In partially observable settings, imitation policies must rely on observation histories, but many seemingly paradoxical results show better performance for po
In many vision-based reinforcement learning (RL) problems, the agent controls a movable object in its visual field, e.g., the players avatar in video games and the robotic arm in visual grasping and manipulation. Leveraging action-conditioned video p
Yarbus claim to decode the observers task from eye movements has received mixed reactions. In this paper, we have supported the hypothesis that it is possible to decode the task. We conducted an exploratory analysis on the dataset by projecting featu
In this paper, we present LookOut, a novel autonomy system that perceives the environment, predicts a diverse set of futures of how the scene might unroll and estimates the trajectory of the SDV by optimizing a set of contingency plans over these fut