ترغب بنشر مسار تعليمي؟ اضغط هنا

Simplifying Reward Design through Divide-and-Conquer

121   0   0.0 ( 0 )
 نشر من قبل Ellis Ratner
 تاريخ النشر 2018
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Designing a good reward function is essential to robot planning and reinforcement learning, but it can also be challenging and frustrating. The reward needs to work across multiple different environments, and that often requires many iterations of tuning. We introduce a novel divide-and-conquer approach that enables the designer to specify a reward separately for each environment. By treating these separate reward functions as observations about the underlying true reward, we derive an approach to infer a common reward across all environments. We conduct user studies in an abstract grid world domain and in a motion planning domain for a 7-DOF manipulator that measure user effort and solution quality. We show that our method is faster, easier to use, and produces a higher quality solution than the typical method of designing a reward jointly across all environments. We additionally conduct a series of experiments that measure the sensitivity of these results to different properties of the reward design task, such as the number of environments, the number of feasible solutions per environment, and the fraction of the total features that vary within each environment. We find that independent reward design outperforms the standard, joint, reward design process but works best when the design problem can be divided into simpler subproblems.

قيم البحث

اقرأ أيضاً

In data containing heterogeneous subpopulations, classification performance benefits from incorporating the knowledge of cluster structure in the classifier. Previous methods for such combined clustering and classification are either 1) classifier-sp ecific and not generic, or 2) independently perform clustering and classifier training, which may not form clusters that can potentially benefit classifier performance. The question of how to perform clustering to improve the performance of classifiers trained on the clusters has received scant attention in previous literature, despite its importance in several real-world applications. In this paper, we design a simple and efficient classification algorithm called Clustering Aware Classification (CAC), to find clusters that are well suited for being used as training datasets by classifiers for each underlying subpopulation. Our experiments on synthetic and real benchmark datasets demonstrate the efficacy of CAC over previous methods for combined clustering and classification.
We consider the learning of algorithmic tasks by mere observation of input-output pairs. Rather than studying this as a black-box discrete regression problem with no assumption whatsoever on the input-output mapping, we concentrate on tasks that are amenable to the principle of divide and conquer, and study what are its implications in terms of learning. This principle creates a powerful inductive bias that we leverage with neural architectures that are defined recursively and dynamically, by learning two scale-invariant atomic operations: how to split a given input into smaller sets, and how to merge two partially solved tasks into a larger partial solution. Our model can be trained in weakly supervised environments, namely by just observing input-output pairs, and in even weaker environments, using a non-differentiable reward signal. Moreover, thanks to the dynamic aspect of our architecture, we can incorporate the computational complexity as a regularization term that can be optimized by backpropagation. We demonstrate the flexibility and efficiency of the Divide-and-Conquer Network on several combinatorial and geometric tasks: convex hull, clustering, knapsack and euclidean TSP. Thanks to the dynamic programming nature of our model, we show significant improvements in terms of generalization error and computational complexity.
Standard planners for sequential decision making (including Monte Carlo planning, tree search, dynamic programming, etc.) are constrained by an implicit sequential planning assumption: The order in which a plan is constructed is the same in which it is executed. We consider alternatives to this assumption for the class of goal-directed Reinforcement Learning (RL) problems. Instead of an environment transition model, we assume an imperfect, goal-directed policy. This low-level policy can be improved by a plan, consisting of an appropriate sequence of sub-goals that guide it from the start to the goal state. We propose a planning algorithm, Divide-and-Conquer Monte Carlo Tree Search (DC-MCTS), for approximating the optimal plan by means of proposing intermediate sub-goals which hierarchically partition the initial tasks into simpler ones that are then solved independently and recursively. The algorithm critically makes use of a learned sub-goal proposal for finding appropriate partitions trees of new tasks based on prior experience. Different strategies for learning sub-goal proposals give rise to different planning strategies that strictly generalize sequential planning. We show that this algorithmic flexibility over planning order leads to improved results in navigation tasks in grid-worlds as well as in challenging continuous control environments.
Spectral clustering is one of the most popular clustering methods. However, how to balance the efficiency and effectiveness of the large-scale spectral clustering with limited computing resources has not been properly solved for a long time. In this paper, we propose a divide-and-conquer based large-scale spectral clustering method to strike a good balance between efficiency and effectiveness. In the proposed method, a divide-and-conquer based landmark selection algorithm and a novel approximate similarity matrix approach are designed to construct a sparse similarity matrix within extremely low cost. Then clustering results can be computed quickly through a bipartite graph partition process. The proposed method achieves the lower computational complexity than most existing large-scale spectral clustering. Experimental results on ten large-scale datasets have demonstrated the efficiency and effectiveness of the proposed methods. The MATLAB code of the proposed method and experimental datasets are available at https://github.com/Li-Hongmin/MyPaperWithCode.
Deep metric learning (DML) is a cornerstone of many computer vision applications. It aims at learning a mapping from the input domain to an embedding space, where semantically similar objects are located nearby and dissimilar objects far from another . The target similarity on the training data is defined by user in form of ground-truth class labels. However, while the embedding space learns to mimic the user-provided similarity on the training data, it should also generalize to novel categories not seen during training. Besides user-provided groundtruth training labels, a lot of additional visual factors (such as viewpoint changes or shape peculiarities) exist and imply different notions of similarity between objects, affecting the generalization on the images unseen during training. However, existing approaches usually directly learn a single embedding space on all available training data, struggling to encode all different types of relationships, and do not generalize well. We propose to build a more expressive representation by jointly splitting the embedding space and the data hierarchically into smaller sub-parts. We successively focus on smaller subsets of the training data, reducing its variance and learning a different embedding subspace for each data subset. Moreover, the subspaces are learned jointly to cover not only the intricacies, but the breadth of the data as well. Only after that, we build the final embedding from the subspaces in the conquering stage. The proposed algorithm acts as a transparent wrapper that can be placed around arbitrary existing DML methods. Our approach significantly improves upon the state-of-the-art on image retrieval, clustering, and re-identification tasks evaluated using CUB200-2011, CARS196, Stanford Online Products, In-shop Clothes, and PKU VehicleID datasets.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا