ترغب بنشر مسار تعليمي؟ اضغط هنا

Simultaneous localization and mapping (SLAM) remains challenging for a number of downstream applications, such as visual robot navigation, because of rapid turns, featureless walls, and poor camera quality. We introduce the Differentiable SLAM Networ k (SLAM-net) along with a navigation architecture to enable planar robot navigation in previously unseen indoor environments. SLAM-net encodes a particle filter based SLAM algorithm in a differentiable computation graph, and learns task-oriented neural network components by backpropagating through the SLAM algorithm. Because it can optimize all model components jointly for the end-objective, SLAM-net learns to be robust in challenging conditions. We run experiments in the Habitat platform with different real-world RGB and RGB-D datasets. SLAM-net significantly outperforms the widely adapted ORB-SLAM in noisy conditions. Our navigation architecture with SLAM-net improves the state-of-the-art for the Habitat Challenge 2020 PointNav task by a large margin (37% to 64% success). Project website: http://sites.google.com/view/slamnet
121 - Xiao Ma , David Hsu , Wee Sun Lee 2021
Manipulating deformable objects, such as cloth and ropes, is a long-standing challenge in robotics: their large number of degrees of freedom (DoFs) and complex non-linear dynamics make motion planning extremely difficult. This work aims to learn late nt Graph dynamics for DefOrmable Object Manipulation (G-DOOM). To tackle the challenge of many DoFs and complex dynamics, G-DOOM approximates a deformable object as a sparse set of interacting keypoints and learns a graph neural network that captures abstractly the geometry and interaction dynamics of the keypoints. Further, to tackle the perceptual challenge, specifically, object self-occlusion, G-DOOM adds a recurrent neural network to track the keypoints over time and condition their interactions on the history. We then train the resulting recurrent graph dynamics model through contrastive learning in a high-fidelity simulator. For manipulation planning, G-DOOM explicitly reasons about the learned dynamics model through model-predictive control applied at each of the keypoints. We evaluate G-DOOM on a set of challenging cloth and rope manipulation tasks and show that G-DOOM outperforms a state-of-the-art method. Further, although trained entirely on simulation data, G-DOOM transfers directly to a real robot for both cloth and rope manipulation in our experiments.
126 - Xiao Ma , Siwei Chen , David Hsu 2020
Deep reinforcement learning (DRL) has achieved significant success in various robot tasks: manipulation, navigation, etc. However, complex visual observations in natural environments remains a major challenge. This paper presents Contrastive Variatio nal Reinforcement Learning (CVRL), a model-based method that tackles complex visual observations in DRL. CVRL learns a contrastive variational model by maximizing the mutual information between latent states and observations discriminatively, through contrastive learning. It avoids modeling the complex observation space unnecessarily, as the commonly used generative observation model often does, and is significantly more robust. CVRL achieves comparable performance with state-of-the-art model-based DRL methods on standard Mujoco tasks. It significantly outperforms them on Natural Mujoco tasks and a robot box-pushing task with complex observations, e.g., dynamic shadows. The CVRL code is available publicly at https://github.com/Yusufma03/CVRL.
141 - Siwei Chen , Xiao Ma , David Hsu 2020
It has been arduous to assess the progress of a policy learning algorithm in the domain of hierarchical task with high dimensional action space due to the lack of a commonly accepted benchmark. In this work, we propose a new light-weight benchmark ta sk called Diner Dash for evaluating the performance in a complicated task with high dimensional action space. In contrast to the traditional Atari games that only have a flat structure of goals and very few actions, the proposed benchmark task has a hierarchical task structure and size of 57 for the action space and hence can facilitate the development of policy learning in complicated tasks. On top of that, we introduce Decomposed Policy Graph Modelling (DPGM), an algorithm that combines both graph modelling and deep learning to allow explicit domain knowledge embedding and achieves significant improvement comparing to the baseline. In the experiments, we have shown the effectiveness of the domain knowledge injection via a specially designed imitation algorithm as well as results of other popular algorithms.
Deep reinforcement learning is successful in decision making for sophisticated games, such as Atari, Go, etc. However, real-world decision making often requires reasoning with partial information extracted from complex visual observations. This paper presents Discriminative Particle Filter Reinforcement Learning (DPFRL), a new reinforcement learning framework for complex partial observations. DPFRL encodes a differentiable particle filter in the neural network policy for explicit reasoning with partial observations over time. The particle filter maintains a belief using learned discriminative update, which is trained end-to-end for decision making. We show that using the discriminative update instead of standard generative models results in significantly improved performance, especially for tasks with complex visual observations, because they circumvent the difficulty of modeling complex observations that are irrelevant to decision making. In addition, to extract features from the particle belief, we propose a new type of belief feature based on the moment generating function. DPFRL outperforms state-of-the-art POMDP RL models in Flickering Atari Games, an existing POMDP RL benchmark, and in Natural Flickering Atari Games, a new, more challenging POMDP RL benchmark introduced in this paper. Further, DPFRL performs well for visual navigation with real-world data in the Habitat environment.
Autonomous driving in mixed traffic requires reliable motion prediction of nearby traffic agents such as pedestrians, bicycles, cars, buses, etc.. This prediction problem is extremely challenging because of the diverse dynamics and geometry of traffi c agents, complex road conditions, and intensive interactions among the agents. In this paper, we proposed GAMMA, a general agent motion prediction model for autonomous driving, that can predict the motion of heterogeneous traffic agents with different kinematics, geometry, human agents inner states, etc.. GAMMA formalizes motion prediction as geometric optimization in the velocity space, and integrates physical constraints and human inner states into this unified framework. Our results show that GAMMA outperforms state-of-the-art approaches significantly on diverse real-world datasets.
152 - Xiao Ma , Peter Karkus , David Hsu 2019
Recurrent neural networks (RNNs) have been extraordinarily successful for prediction with sequential data. To tackle highly variable and noisy real-world data, we introduce Particle Filter Recurrent Neural Networks (PF-RNNs), a new RNN family that ex plicitly models uncertainty in its internal structure: while an RNN relies on a long, deterministic latent state vector, a PF-RNN maintains a latent state distribution, approximated as a set of particles. For effective learning, we provide a fully differentiable particle filter algorithm that updates the PF-RNN latent state distribution according to the Bayes rule. Experiments demonstrate that the proposed PF-RNNs outperform the corresponding standard gated RNNs on a synthetic robot localization dataset and 10 real-world sequence prediction datasets for text classification, stock price prediction, etc.
This paper introduces the Differentiable Algorithm Network (DAN), a composable architecture for robot learning systems. A DAN is composed of neural network modules, each encoding a differentiable robot algorithm and an associated model; and it is tra ined end-to-end from data. DAN combines the strengths of model-driven modular system design and data-driven end-to-end learning. The algorithms and models act as structural assumptions to reduce the data requirements for learning; end-to-end learning allows the modules to adapt to one another and compensate for imperfect models and algorithms, in order to achieve the best overall system performance. We illustrate the DAN methodology through a case study on a simulated robot system, which learns to navigate in complex 3-D environments with only local visual observations and an image of a partially correct 2-D floor map.
We propose to take a novel approach to robot system design where each building block of a larger system is represented as a differentiable program, i.e. a deep neural network. This representation allows for integrating algorithmic planning and deep l earning in a principled manner, and thus combine the benefits of model-free and model-based methods. We apply the proposed approach to a challenging partially observable robot navigation task. The robot must navigate to a goal in a previously unseen 3-D environment without knowing its initial location, and instead relying on a 2-D floor map and visual observations from an onboard camera. We introduce the Navigation Networks (NavNets) that encode state estimation, planning and acting in a single, end-to-end trainable recurrent neural network. In preliminary simulation experiments we successfully trained navigation networks to solve the challenging partially observable navigation task.
143 - Mohit Shridhar , David Hsu 2018
This paper presents INGRESS, a robot system that follows human natural language instructions to pick and place everyday objects. The core issue here is the grounding of referring expressions: infer objects and their relationships from input images an d language expressions. INGRESS allows for unconstrained object categories and unconstrained language expressions. Further, it asks questions to disambiguate referring expressions interactively. To achieve these, we take the approach of grounding by generation and propose a two-stage neural network model for grounding. The first stage uses a neural network to generate visual descriptions of objects, compares them with the input language expression, and identifies a set of candidate objects. The second stage uses another neural network to examine all pairwise relations between the candidates and infers the most likely referred object. The same neural networks are used for both grounding and question generation for disambiguation. Experiments show that INGRESS outperformed a state-of-the-art method on the RefCOCO dataset and in robot experiments with humans.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا