No Arabic abstract
Intuitively, obedience -- following the order that a human gives -- seems like a good property for a robot to have. But, we humans are not perfect and we may give orders that are not best aligned to our preferences. We show that when a human is not perfectly rational then a robot that tries to infer and act according to the humans underlying preferences can always perform better than a robot that simply follows the humans literal order. Thus, there is a tradeoff between the obedience of a robot and the value it can attain for its owner. We investigate how this tradeoff is impacted by the way the robot infers the humans preferences, showing that some methods err more on the side of obedience than others. We then analyze how performance degrades when the robot has a misspecified model of the features that the human cares about or the level of rationality of the human. Finally, we study how robots can start detecting such model misspecification. Overall, our work suggests that there might be a middle ground in which robots intelligently decide when to obey human orders, but err on the side of obedience.
Emphatic Temporal Difference (ETD) learning has recently been proposed as a convergent off-policy learning method. ETD was proposed mainly to address convergence issues of conventional Temporal Difference (TD) learning under off-policy training but it is different from conventional TD learning even under on-policy training. A simple counterexample provided back in 2017 pointed to a potential class of problems where ETD converges but TD diverges. In this paper, we empirically show that ETD converges on a few other well-known on-policy experiments whereas TD either diverges or performs poorly. We also show that ETD outperforms TD on the mountain car prediction problem. Our results, together with a similar pattern observed under off-policy training in prior works, suggest that ETD might be a good substitute over conventional TD.
In a recent Letter [G. Chiribella et al., Phys. Rev. Lett. 98, 120501 (2007)], four protocols were proposed to secretly transmit a reference frame. Here We point out that in these protocols an eavesdropper can change the transmitted reference frame without being detected, which means the consistency of the shared reference frames should be reexamined. The way to check the above consistency is discussed. It is shown that this problem is quite different from that in previous protocols of quantum cryptography.
The challenge of robotic reproduction -- making of new robots by recombining two existing ones -- has been recently cracked and physically evolving robot systems have come within reach. Here we address the next big hurdle: producing an adequate brain for a newborn robot. In particular, we address the task of targeted locomotion which is arguably a fundamental skill in any practical implementation. We introduce a controller architecture and a generic learning method to allow a modular robot with an arbitrary shape to learn to walk towards a target and follow this target if it moves. Our approach is validated on three robots, a spider, a gecko, and their offspring, in three real-world scenarios.
Providing architectural support is crucial for newly arising applications to achieve high performance and high system efficiency. Currently there is a trend in designing accelerators for special applications, while arguably a debate is sparked whether we should customize architecture for each application. In this study, we introduce what we refer to as Gene-Patterns, which are the base patterns of diverse applications. We present a Recursive Reduce methodology to identify the hotspots, and a HOtspot Trace Suite (HOTS) is provided for the research community. We first extract the hotspot patterns, and then, remove the redundancy to obtain the base patterns. We find that although the number of applications is huge and ever-increasing, the amount of base patterns is relatively small, due to the similarity among the patterns of diverse applications. The similarity stems not only from the algorithms but also from the data structures. We build the Periodic Table of Memory Access Patterns (PT-MAP), where the indifference curves are analogous to the energy levels in physics, and memory performance optimization is essentially an energy level transition. We find that inefficiency results from the mismatch between some of the base patterns and the micro-architecture of modern processors. We have identified the key micro-architecture demands of the base patterns. The Gene-Pattern concept, methodology, and toolkit will facilitate the design of both hardware and software for the matching between architectures and applications.
Commodity cloud computing, as provided by commercial vendors such as Amazon, Google, and Microsoft, has revolutionized computing in many sectors. With the advent of a new class of big data, public access astronomical facility such as LSST, DKIST, and WFIRST, there exists a real opportunity to combine these missions with cloud computing platforms and fundamentally change the way astronomical data is collected, processed, archived, and curated. Making these changes in a cross-mission, coordinated way can provide unprecedented economies of scale in personnel, data collection and management, archiving, algorithm and software development and, most importantly, science.