Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Refine and Imitate: Reducing Repetition and Inconsistency in Persuasion Dialogues via Reinforcement Learning and Human Demonstration

صقل وتقليد: الحد من التكرار والتناقض في حوارات الإقناع عن طريق التعلم التعزيز والتوضيح البشري

583 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

يعكس نظام الحوار الإقناعي قدرة الجهاز على جعل التحركات الإستراتيجية تتجاوز التواصل اللفظي، وبالتالي يميز نفسه عن حوارات موجهة نحو المهام أو فتح المجال ولديها قيمها الفريدة الخاصة بها. ومع ذلك، لا تزال مشاكل التكرار والتناسق لا تزال قائمة في توليد استجابة الحوار ويمكن أن تؤثر بشكل كبير على تجربة المستخدم وتعيق نتائج الإقناع. علاوة على ذلك، على الرغم من أن نهج التعزيز (RL) قد حقق نجاحا كبيرا في المهام الاستراتيجية مثل الألعاب، إلا أنها تتطلب محاكاة مستخدم متطورة لتوفير ملاحظات في الوقت الفعلي لنظام الحوار، مما يحد من تطبيق RL على حوارات الإقناع. لمعالجة هذه المشكلات نحو نظام حوار أفضل للإقناع، نقوم بتطبيق RL لتحسين خط الأساس طراز اللغة دون محاكاة المستخدمين، وتقطير المعلومات على مستوى الجملة حول التكرار، والتناسق، والأهمية المهمة من خلال المكافآت. علاوة على ذلك، لإنجاز مهمة الإقناع بشكل أفضل، يتعلم النموذج من مظاهرة بشرية لتقليد سلوك الإقناع البشري واختيار الاستجابات الأكثر إقناعا. تشير التجارب إلى أن نموذجنا يتفوق على نماذج الحوار السابقة من الحوار السابقة على كل من المقاييس التلقائية ونتائج التقييم البشري على مهمة إقناع التبرع، ويولد محادثات أكثر تنوعا ومتسقا ومقنعة وفقا لتعليقات المستخدمين. سنقوم بإجراء التعليمات البرمجية والنموذج المتاحة للجمهور.

Persuasion dialogue system reflects the machine's ability to make strategic moves beyond verbal communication, and therefore differentiates itself from task-oriented or open-domain dialogues and has its own unique values. However, the repetition and inconsistency problems still persist in dialogue response generation and could substantially impact user experience and impede the persuasion outcome. Besides, although reinforcement learning (RL) approaches have achieved big success in strategic tasks such as games, it requires a sophisticated user simulator to provide real-time feedback to the dialogue system, which limits the application of RL on persuasion dialogues. To address these issues towards a better persuasion dialogue system, we apply RL to refine a language model baseline without user simulators, and distill sentence-level information about repetition, inconsistency, and task relevance through rewards. Moreover, to better accomplish the persuasion task, the model learns from human demonstration to imitate human persuasion behavior and selects the most persuasive responses. Experiments show that our model outperforms previous state-of-the-art dialogue models on both automatic metrics and human evaluation results on a donation persuasion task, and generates more diverse, consistent and persuasive conversations according to the user feedback. We will make the code and model publicly available.

References used

https://aclanthology.org/

rate research

A Proposal: Interactively Learning to Summarise Timelines by Reinforcement Learning

735 - Association for Computation Linguistics 2021 مقالة

Timeline Summarisation (TLS) aims to generate a concise, time-ordered list of events described in sources such as news articles. However, current systems do not provide an adequate way to adapt to new domains nor to focus on the aspects of interest t o a particular user. Therefore, we propose a method for interactively learning abstractive TLS using Reinforcement Learning (RL). We define a compound reward function and use RL to fine-tune an abstractive Multi-document Summarisation (MDS) model, which avoids the need to train using reference summaries. One of the sub-reward functions will be learned interactively from user feedback to ensure the consistency between users' demands and the generated timeline. The other sub-reward functions contribute to topical coherence and linguistic fluency. We plan experiments to evaluate whether our approach could generate accurate and precise timelines tailored for each user.

لعبة تعلم اللغة summarise timelines learning abstractive tls تلخيص الجداول الزمنية تعلم abrative tls. صناعة حمض الفوسفور

Offline Reinforcement Learning from Human Feedback in Real-World Sequence-to-Sequence Tasks

720 - Association for Computation Linguistics 2021 مقالة

Large volumes of interaction logs can be collected from NLP systems that are deployed in the real world. How can this wealth of information be leveraged? Using such interaction logs in an offline reinforcement learning (RL) setting is a promising app roach. However, due to the nature of NLP tasks and the constraints of production systems, a series of challenges arise. We present a concise overview of these challenges and discuss possible solutions.

human feedback feedback in real-world offline reinforcement learning ردود الفعل الإنسانية ردود الفعل في العالم الحقيقي التعزيز التعزيز غير متصل صناعة حمض الفوسفور المزيد..

On Reducing Repetition in Abstractive Summarization

884 - Association for Computation Linguistics 2021 مقالة

Repetition in natural language generation reduces the informativeness of text and makes it less appealing. Various techniques have been proposed to alleviate it. In this work, we explore and propose techniques to reduce repetition in abstractive summ arization. First, we explore the application of unlikelihood training and embedding matrix regularizers from previous work on language modeling to abstractive summarization. Next, we extend the coverage and temporal attention mechanisms to the token level to reduce repetition. In our experiments on the CNN/Daily Mail dataset, we observe that these techniques reduce the amount of repetition and increase the informativeness of the summaries, which we confirm via human evaluation.

reducing repetition الحد من التكرار صناعة حمض الفوسفور

Quantitative Day Trading from Natural Language using Reinforcement Learning

929 - Association for Computation Linguistics 2021 مقالة

It is challenging to design profitable and practical trading strategies, as stock price movements are highly stochastic, and the market is heavily influenced by chaotic data across sources like news and social media. Existing NLP approaches largely t reat stock prediction as a classification or regression problem and are not optimized to make profitable investment decisions. Further, they do not model the temporal dynamics of large volumes of diversely influential text to which the market responds quickly. Building on these shortcomings, we propose a deep reinforcement learning approach that makes time-aware decisions to trade stocks while optimizing profit using textual data. Our method outperforms state-of-the-art in terms of risk-adjusted returns in trading simulations on two benchmarks: Tweets (English) and financial news (Chinese) pertaining to two major indexes and four global stock markets. Through extensive experiments and studies, we build the case for our method as a tool for quantitative trading.

أزواج CQA quantitative day trading day trading تداول اليوم الكمي تجارة يومية صناعة حمض الفوسفور

ReGen: Reinforcement Learning for Text and Knowledge Base Generation using Pretrained Language Models

932 - Association for Computation Linguistics 2021 مقالة

Automatic construction of relevant Knowledge Bases (KBs) from text, and generation of semantically meaningful text from KBs are both long-standing goals in Machine Learning. In this paper, we present ReGen, a bidirectional generation of text and grap h leveraging Reinforcement Learning to improve performance. Graph linearization enables us to re-frame both tasks as a sequence to sequence generation problem regardless of the generative direction, which in turn allows the use of Reinforcement Learning for sequence training where the model itself is employed as its own critic leading to Self-Critical Sequence Training (SCST). We present an extensive investigation demonstrating that the use of RL via SCST benefits graph and text generation on WebNLG+ 2020 and TekGen datasets. Our system provides state-of-the-art results on WebNLG+ 2020 by significantly improving upon published results from the WebNLG 2020+ Challenge for both text-to-graph and graph-to-text generation tasks. More details at https://github.com/IBM/regen.

برمجة knowledge base generation جيل قاعدة المعرفة صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Refine and Imitate: Reducing Repetition and Inconsistency in Persuasion Dialogues via Reinforcement Learning and Human Demonstration

صقل وتقليد: الحد من التكرار والتناقض في حوارات الإقناع عن طريق التعلم التعزيز والتوضيح البشري

Ask ChatGPT about the research

Read More

suggested questions