No Arabic abstract
We demonstrate that challenging shortest path problems can be solved via direct spline regression from a neural network, trained in an unsupervised manner (i.e. without requiring ground truth optimal paths for training). To achieve this, we derive a geometry-dependent optimal cost function whose minima guarantees collision-free solutions. Our method beats state-of-the-art supervised learning baselines for shortest path planning, with a much more scalable training pipeline, and a significant speedup in inference time.
While reinforcement learning (RL) has the potential to enable robots to autonomously acquire a wide range of skills, in practice, RL usually requires manual, per-task engineering of reward functions, especially in real world settings where aspects of the environment needed to compute progress are not directly accessible. To enable robots to autonomously learn skills, we instead consider the problem of reinforcement learning without access to rewards. We aim to learn an unsupervised embedding space under which the robot can measure progress towards a goal for itself. Our approach explicitly optimizes for a metric space under which action sequences that reach a particular state are optimal when the goal is the final state reached. This enables learning effective and control-centric representations that lead to more autonomous reinforcement learning algorithms. Our experiments on three simulated environments and two real-world manipulation problems show that our method can learn effective goal metrics from unlabeled interaction, and use the learned goal metrics for autonomous reinforcement learning.
Recent learning-to-plan methods have shown promising results on planning directly from observation space. Yet, their ability to plan for long-horizon tasks is limited by the accuracy of the prediction model. On the other hand, classical symbolic planners show remarkable capabilities in solving long-horizon tasks, but they require predefined symbolic rules and symbolic states, restricting their real-world applicability. In this work, we combine the benefits of these two paradigms and propose a learning-to-plan method that can directly generate a long-term symbolic plan conditioned on high-dimensional observations. We borrow the idea of regression (backward) planning from classical planning literature and introduce Regression Planning Networks (RPN), a neural network architecture that plans backward starting at a task goal and generates a sequence of intermediate goals that reaches the current observation. We show that our model not only inherits many favorable traits from symbolic planning, e.g., the ability to solve previously unseen tasks but also can learn from visual inputs in an end-to-end manner. We evaluate the capabilities of RPN in a grid world environment and a simulated 3D kitchen environment featuring complex visual scenes and long task horizons, and show that it achieves near-optimal performance in completely new task instances.
Developing agents that can perform complex control tasks from high dimensional observations such as pixels is challenging due to difficulties in learning dynamics efficiently. In this work, we propose to learn forward and inverse dynamics in a fully unsupervised manner via contrastive estimation. Specifically, we train a forward dynamics model and an inverse dynamics model in the feature space of states and actions with data collected from random exploration. Unlike most existing deterministic models, our energy-based model takes into account the stochastic nature of agent-environment interactions. We demonstrate the efficacy of our approach across a variety of tasks including goal-directed planning and imitation from observations. Project videos and code are at https://jianrenw.github.io/cloud/.
Heterogeneous Information Network (HIN) has attracted much attention due to its wide applicability in a variety of data mining tasks, especially for tasks with multi-typed objects. A potentially large number of meta-paths can be extracted from the heterogeneous networks, providing abundant semantic knowledge. Though a variety of meta-paths can be defined, too many meta-paths are redundant. Reduction on the number of meta-paths can enhance the effectiveness since some redundant meta-paths provide interferential linkage to the task. Moreover, the reduced meta-paths can reflect the characteristic of the heterogeneous network. Previous endeavors try to reduce the number of meta-paths under the guidance of supervision information. Nevertheless, supervised information is expensive and may not always be available. In this paper, we propose a novel algorithm, SPMR (Semantic Preserving Meta-path Reduction), to reduce a set of pre-defined meta-paths in an unsupervised setting. The proposed method is able to evaluate a set of meta-paths to maximally preserve the semantics of original meta-paths after reduction. Experimental results show that SPMR can select a succinct subset of meta-paths which can achieve comparable or even better performance with fewer meta-paths.
We revisit the choice of SGD for training deep neural networks by reconsidering the appropriate geometry in which to optimize the weights. We argue for a geometry invariant to rescaling of weights that does not affect the output of the network, and suggest Path-SGD, which is an approximate steepest descent method with respect to a path-wise regularizer related to max-norm regularization. Path-SGD is easy and efficient to implement and leads to empirical gains over SGD and AdaGrad.