Adapting Deep Visuomotor Representations with Weak Pairwise Constraints

66 0 0.0 ( 0 )

Download Cite

Added by Eric Tzeng

Publication date 2015

fields Informatics Engineering

and research's language is English

Authors Eric Tzeng - Coline Devin - Judy Hoffman

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Real-world robotics problems often occur in domains that differ significantly from the robots prior training environment. For many robotic control tasks, real world experience is expensive to obtain, but data is easy to collect in either an instrumented environment or in simulation. We propose a novel domain adaptation approach for robot perception that adapts visual representations learned on a large easy-to-obtain source dataset (e.g. synthetic images) to a target real-world domain, without requiring expensive manual data annotation of real world data before policy search. Supervised domain adaptation methods minimize cross-domain differences using pairs of aligned images that contain the same object or scene in both the source and target domains, thus learning a domain-invariant representation. However, they require manual alignment of such image pairs. Fully unsupervised adaptation methods rely on minimizing the discrepancy between the feature distributions across domains. We propose a novel, more powerful combination of both distribution and pairwise image alignment, and remove the requirement for expensive annotation by using weakly aligned pairs of images in the source and target domains. Focusing on adapting from simulation to real world data using a PR2 robot, we evaluate our approach on a manipulation task and show that by using weakly paired images, our method compensates for domain shift more effectively than previous techniques, enabling better robot performance in the real world.

rate research

Learning Visuomotor Policies for Aerial Navigation Using Cross-Modal Representations

123 - Rogerio Bonatti , Ratnesh Madaan , Vibhav Vineet 2019

Machines are a long way from robustly solving open-world perception-control tasks, such as first-person view (FPV) aerial navigation. While recent advances in end-to-end Machine Learning, especially Imitation and Reinforcement Learning appear promising, they are constrained by the need of large amounts of difficult-to-collect labeled real-world data. Simulated data, on the other hand, is easy to generate, but generally does not render safe behaviors in diverse real-life scenarios. In this work we propose a novel method for learning robust visuomotor policies for real-world deployment which can be trained purely with simulated data. We develop rich state representations that combine supervised and unsupervised environment data. Our approach takes a cross-modal perspective, where separate modalities correspond to the raw camera data and the system states relevant to the task, such as the relative pose of gates to the drone in the case of drone racing. We feed both data modalities into a novel factored architecture, which learns a joint low-dimensional embedding via Variational Auto Encoders. This compact representation is then fed into a control policy, which we trained using imitation learning with expert trajectories in a simulator. We analyze the rich latent spaces learned with our proposed representations, and show that the use of our cross-modal architecture significantly improves control policy performance as compared to end-to-end learning or purely unsupervised feature extractors. We also present real-world results for drone navigation through gates in different track configurations and environmental conditions. Our proposed method, which runs fully onboard, can successfully generalize the learned representations and policies across simulation and reality, significantly outperforming baseline approaches. Supplementary video: https://youtu.be/VKc3A5HlUU8

Computer Vision and Pattern Recognition Robotics

A Practical Maximum Clique Algorithm for Matching with Pairwise Constraints

236 - Alvaro Parra , Tat-Jun Chin , Frank Neumann 2019

A popular paradigm for 3D point cloud registration is by extracting 3D keypoint correspondences, then estimating the registration function from the correspondences using a robust algorithm. However, many existing 3D keypoint techniques tend to produce large proportions of erroneous correspondences or outliers, which significantly increases the cost of robust estimation. An alternative approach is to directly search for the subset of correspondences that are pairwise consistent, without optimising the registration function. This gives rise to the combinatorial problem of matching with pairwise constraints. In this paper, we propose a very efficient maximum clique algorithm to solve matching with pairwise constraints. Our technique combines tree searching with efficient bounding and pruning based on graph colouring. We demonstrate that, despite the theoretical intractability, many real problem instances can be solved exactly and quickly (seconds to minutes) with our algorithm, which makes our approach an excellent alternative to standard robust techniques for 3D registration.

Computer Vision and Pattern Recognition

A Novel Method for the Absolute Pose Problem with Pairwise Constraints

64 - Yinlong Liu , Xuechen Li , Manning Wang 2019

Absolute pose estimation is a fundamental problem in computer vision, and it is a typical parameter estimation problem, meaning that efforts to solve it will always suffer from outlier-contaminated data. Conventionally, for a fixed dimensionality d and the number of measurements N, a robust estimation problem cannot be solved faster than O(N^d). Furthermore, it is almost impossible to remove d from the exponent of the runtime of a globally optimal algorithm. However, absolute pose estimation is a geometric parameter estimation problem, and thus has special constraints. In this paper, we consider pairwise constraints and propose a globally optimal algorithm for solving the absolute pose estimation problem. The proposed algorithm has a linear complexity in the number of correspondences at a given outlier ratio. Concretely, we first decouple the rotation and the translation subproblems by utilizing the pairwise constraints, and then we solve the rotation subproblem using the branch-and-bound algorithm. Lastly, we estimate the translation based on the known rotation by using another branch-and-bound algorithm. The advantages of our method are demonstrated via thorough testing on both synthetic and real-world data

Computer Vision and Pattern Recognition

Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Visuomotor Policies

96 - Alexander Sax , Bradley Emi , Amir R. Zamir 2018

How much does having visual priors about the world (e.g. the fact that the world is 3D) assist in learning to perform downstream motor tasks (e.g. delivering a package)? We study this question by integrating a generic perceptual skill set (e.g. a distance estimator, an edge detector, etc.) within a reinforcement learning framework--see Figure 1. This skill set (hereafter mid-level perception) provides the policy with a more processed state of the world compared to raw images. We find that using a mid-level perception confers significant advantages over training end-to-end from scratch (i.e. not leveraging priors) in navigation-oriented tasks. Agents are able to generalize to situations where the from-scratch approach fails and training becomes significantly more sample efficient. However, we show that realizing these gains requires careful selection of the mid-level perceptual skills. Therefore, we refine our findings into an efficient max-coverage feature set that can be adopted in lieu of raw images. We perform our study in completely separate buildings for training and testing and compare against visually blind baseline policies and state-of-the-art feature learning methods.

Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning

Operationalizing Individual Fairness with Pairwise Fair Representations

112 - Preethi Lahoti , Krishna P. Gummadi , 2019

We revisit the notion of individual fairness proposed by Dwork et al. A central challenge in operationalizing their approach is the difficulty in eliciting a human specification of a similarity metric. In this paper, we propose an operationalization of individual fairness that does not rely on a human specification of a distance metric. Instead, we propose novel approaches to elicit and leverage side-information on equally deserving individuals to counter subordination between social groups. We model this knowledge as a fairness graph, and learn a unified Pairwise Fair Representation (PFR) of the data that captures both data-driven similarity between individuals and the pairwise side-information in fairness graph. We elicit fairness judgments from a variety of sources, including human judgments for two real-world datasets on recidivism prediction (COMPAS) and violent neighborhood prediction (Crime & Communities). Our experiments show that the PFR model for operationalizing individual fairness is practically viable.

Machine Learning Machine Learning