Mapping Natural Language Instructions to Mobile UI Action Sequences

63 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yang Li

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Yang Li - Jiacong He - Xin Zhou

الحساب واللغة التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We present a new problem: grounding natural language instructions to mobile user interface actions, and create three new datasets for it. For full task evaluation, we create PIXELHELP, a corpus that pairs English instructions with actions performed by people on a mobile UI emulator. To scale training, we decouple the language and action data by (a) annotating action phrase spans in HowTo instructions and (b) synthesizing grounded descriptions of actions for mobile user interfaces. We use a Transformer to extract action phrase tuples from long-range natural language instructions. A grounding Transformer then contextually represents UI objects using both their content and screen position and connects them to object descriptions. Given a starting screen and instruction, our model achieves 70.59% accuracy on predicting complete ground-truth action sequences in PIXELHELP.

قيم البحث

368 - Hongyuan Mei , Mohit Bansal , Matthew R. Walter 2015

We propose a neural sequence-to-sequence model for direction following, a task that is essential to realizing effective autonomous agents. Our alignment-based encoder-decoder model with long short-term memory recurrent neural networks (LSTM-RNN) tran slates natural language instructions to action sequences based upon a representation of the observable world state. We introduce a multi-level aligner that empowers our model to focus on sentence regions salient to the current world state by using multiple abstractions of the input sentence. In contrast to existing methods, our model uses no specialized linguistic resources (e.g., parsers) or task-specific annotations (e.g., seed lexicons). It is therefore generalizable, yet still achieves the best results reported to-date on a benchmark single-sentence dataset and competitive results for the limited-training multi-sentence setting. We analyze our model through a series of ablations that elucidate the contributions of the primary components of our model.

الحساب واللغة الذكاء الاصطناعي التعلم الآلي

Cross-Task Generalization via Natural Language Crowdsourcing Instructions

127 - Swaroop Mishra , Daniel Khashabi , Chitta Baral 2021

Humans (e.g., crowdworkers) have a remarkable ability in solving different tasks, by simply reading textual instructions that define them and looking at a few examples. NLP models built with the conventional paradigm, however, often struggle with gen eralization across tasks (e.g., a question-answering system cannot solve classification tasks). A long-standing challenge in AI is to build a model that is equipped with the understanding of human-readable instructions that define the tasks, and can generalize to new tasks. To study this, we introduce NATURAL INSTRUCTIONS, a dataset of 61 distinct tasks, their human-authored instructions and 193k task instances. The instructions are obtained from crowdsourcing instructions used to collect existing NLP datasets and mapped to a unified schema. We adopt generative pre-trained language models to encode task-specific instructions along with input and generate task output. Our results indicate that models can benefit from instructions to generalize across tasks. These models, however, are far behind supervised task-specific models, indicating significant room for more progress in this direction.

الحساب واللغة الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

From Natural Language Instructions to Complex Processes: Issues in Chaining Trigger Action Rules

63 - Nobuhiro Ito , Yuya Suzuki , Akiko Aizawa 2020

Automation services for complex business processes usually require a high level of information technology literacy. There is a strong demand for a smartly assisted process automation (IPA: intelligent process automation) service that enables even gen eral users to easily use advanced automation. A natural language interface for such automation is expected as an elemental technology for the IPA realization. The workflow targeted by IPA is generally composed of a combination of multiple tasks. However, semantic parsing, one of the natural language processing methods, for such complex workflows has not yet been fully studied. The reasons are that (1) the formal expression and grammar of the workflow required for semantic analysis have not been sufficiently examined and (2) the dataset of the workflow formal expression with its corresponding natural language description required for learning workflow semantics did not exist. This paper defines a new grammar for complex workflows with chaining machine-executable meaning representations for semantic parsing. The representations are at a high abstraction level. Additionally, an approach to creating datasets is proposed based on this grammar.

الذكاء الاصطناعي الحساب واللغة

GANCoder: An Automatic Natural Language-to-Programming Language Translation Approach based on GAN

322 - Yabing Zhu , Yanfeng Zhang , Huili Yang 2019

We propose GANCoder, an automatic programming approach based on Generative Adversarial Networks (GAN), which can generate the same functional and logical programming language codes conditioned on the given natural language utterances. The adversarial training between generator and discriminator helps generator learn distribution of dataset and improve code generation quality. Our experimental results show that GANCoder can achieve comparable accuracy with the state-of-the-art methods and is more stable when programming languages.

الحساب واللغة التعلم الآلي

Enabling Robots to Understand Incomplete Natural Language Instructions Using Commonsense Reasoning

75 - Haonan Chen , Hao Tan , Alan Kuntz 2019

Enabling robots to understand instructions provided via spoken natural language would facilitate interaction between robots and people in a variety of settings in homes and workplaces. However, natural language instructions are often missing informat ion that would be obvious to a human based on environmental context and common sense, and hence does not need to be explicitly stated. In this paper, we introduce Language-Model-based Commonsense Reasoning (LMCR), a new method which enables a robot to listen to a natural language instruction from a human, observe the environment around it, and automatically fill in information missing from the instruction using environmental context and a new commonsense reasoning approach. Our approach first converts an instruction provided as unconstrained natural language into a form that a robot can understand by parsing it into verb frames. Our approach then fills in missing information in the instruction by observing objects in its vicinity and leveraging commonsense reasoning. To learn commonsense reasoning automatically, our approach distills knowledge from large unstructured textual corpora by training a language model. Our results show the feasibility of a robot learning commonsense knowledge automatically from web-based textual corpora, and the power of learned commonsense reasoning models in enabling a robot to autonomously perform tasks based on incomplete natural language instructions.

علم الروبوتات الذكاء الاصطناعي الحساب واللغة