New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Perspective-corrected Spatial Referring Expression Generation for Human-Robot Interaction

343 0 0.0 ( 0 )

Download Cite

Added by Mingjiang Liu

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Mingjiang Liu - Chengli Xiao - Chunlin Chen

Robotics Computation and Language Human-Computer Interaction

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Intelligent robots designed to interact with humans in real scenarios need to be able to refer to entities actively by natural language. In spatial referring expression generation, the ambiguity is unavoidable due to the diversity of reference frames, which will lead to an understanding gap between humans and robots. To narrow this gap, in this paper, we propose a novel perspective-corrected spatial referring expression generation (PcSREG) approach for human-robot interaction by considering the selection of reference frames. The task of referring expression generation is simplified into the process of generating diverse spatial relation units. First, we pick out all landmarks in these spatial relation units according to the entropy of preference and allow its updating through a stack model. Then all possible referring expressions are generated according to different reference frame strategies. Finally, we evaluate every expression using a probabilistic referring expression resolution model and find the best expression that satisfies both of the appropriateness and effectiveness. We implement the proposed approach on a robot system and empirical experiments show that our approach can generate more effective spatial referring expressions for practical applications.

rate research

Interactive Visual Grounding of Referring Expressions for Human-Robot Interaction

143 - Mohit Shridhar , David Hsu 2018

This paper presents INGRESS, a robot system that follows human natural language instructions to pick and place everyday objects. The core issue here is the grounding of referring expressions: infer objects and their relationships from input images and language expressions. INGRESS allows for unconstrained object categories and unconstrained language expressions. Further, it asks questions to disambiguate referring expressions interactively. To achieve these, we take the approach of grounding by generation and propose a two-stage neural network model for grounding. The first stage uses a neural network to generate visual descriptions of objects, compares them with the input language expression, and identifies a set of candidate objects. The second stage uses another neural network to examine all pairwise relations between the candidates and infers the most likely referred object. The same neural networks are used for both grounding and question generation for disambiguation. Experiments show that INGRESS outperformed a state-of-the-art method on the RefCOCO dataset and in robot experiments with humans.

Robotics Computation and Language Computer Vision and Pattern Recognition

Grounding Spatio-Semantic Referring Expressions for Human-Robot Interaction

310 - Mohit Shridhar , David Hsu 2017

The human language is one of the most natural interfaces for humans to interact with robots. This paper presents a robot system that retrieves everyday objects with unconstrained natural language descriptions. A core issue for the system is semantic and spatial grounding, which is to infer objects and their spatial relationships from images and natural language expressions. We introduce a two-stage neural-network grounding pipeline that maps natural language referring expressions directly to objects in the images. The first stage uses visual descriptions in the referring expressions to generate a candidate set of relevant objects. The second stage examines all pairwise relationships between the candidates and predicts the most likely referred object according to the spatial descriptions in the referring expressions. A key feature of our system is that by leveraging a large dataset of images labeled with text descriptions, it allows unrestricted object types and natural language referring expressions. Preliminary results indicate that our system outperforms a near state-of-the-art object comprehension system on standard benchmark datasets. We also present a robot system that follows voice commands to pick and place previously unseen objects.

Robotics Artificial Intelligence Computation and Language

Interactive Text2Pickup Network for Natural Language based Human-Robot Collaboration

78 - Hyemin Ahn , Sungjoon Choi , Nuri Kim 2018

In this paper, we propose the Interactive Text2Pickup (IT2P) network for human-robot collaboration which enables an effective interaction with a human user despite the ambiguity in users commands. We focus on the task where a robot is expected to pick up an object instructed by a human, and to interact with the human when the given instruction is vague. The proposed network understands the command from the human user and estimates the position of the desired object first. To handle the inherent ambiguity in human language commands, a suitable question which can resolve the ambiguity is generated. The users answer to the question is combined with the initial command and given back to the network, resulting in more accurate estimation. The experiment results show that given unambiguous commands, the proposed method can estimate the position of the requested object with an accuracy of 98.49% based on our test dataset. Given ambiguous language commands, we show that the accuracy of the pick up task increases by 1.94 times after incorporating the information obtained from the interaction.

Robotics Computation and Language Human-Computer Interaction

Toward Forgetting-Sensitive Referring Expression Generationfor Integrated Robot Architectures

59 - Tom Williams , Torin Johnson , Will Culpepper 2020

To engage in human-like dialogue, robots require the ability to describe the objects, locations, and people in their environment, a capability known as Referring Expression Generation. As speakers repeatedly refer to similar objects, they tend to re-use properties from previous descriptions, in part to help the listener, and in part due to cognitive availability of those properties in working memory (WM). Because different theories of working memory forgetting necessarily lead to differences in cognitive availability, we hypothesize that they will similarly result in generation of different referring expressions. To design effective intelligent agents, it is thus necessary to determine how different models of forgetting may be differentially effective at producing natural human-like referring expressions. In this work, we computationalize two candidate models of working memory forgetting within a robot cognitive architecture, and demonstrate how they lead to cognitive availability-based differences in generated referring expressions.

Artificial Intelligence Computation and Language Robotics

Hey Robot, Which Way Are You Going? Nonverbal Motion Legibility Cues for Human-Robot Spatial Interaction

68 - Nicholas J. Hetherington University of British Columbia 2021

Mobile robots have recently been deployed in public spaces such as shopping malls, airports, and urban sidewalks. Most of these robots are designed with human-aware motion planning capabilities but are not designed to communicate with pedestrians. Pedestrians that encounter these robots without prior understanding of the robots behaviour can experience discomfort, confusion, and delayed social acceptance. In this work we designed and evaluated nonverbal robot motion legibility cues, which communicate a mobile robots motion intention to pedestrians. We compared a motion legibility cue using Projected Arrows to one using Flashing Lights. We designed the cues to communicate path information, goal information, or both, and explored different Robot Movement Scenarios. We conducted an online user study with 229 participants using videos of the motion legibility cues. Our results show that the absence of cues was not socially acceptable, and that Projected Arrows were the more socially acceptable cue in most experimental conditions. We conclude that the presence and choice of motion legibility cues can positively influence robots acceptance and successful deployment in public spaces.

Robotics

comments

Fetching comments

Kalamoon Private University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Perspective-corrected Spatial Referring Expression Generation for Human-Robot Interaction

Ask ChatGPT about the research

No Arabic abstract

Read More