A Large-Scale Analysis of Mixed Initiative in Information-Seeking Dialogues for Conversational Search

59 0 0.0 ( 0 )

Download Cite

Added by Svitlana Vakulenko

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Svitlana Vakulenko - Evangelos Kanoulas - Maarten de Rijke

Information Retrieval

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Conversational search is a relatively young area of research that aims at automating an information-seeking dialogue. In this paper we help to position it with respect to other research areas within conversational Artificial Intelligence (AI) by analysing the structural properties of an information-seeking dialogue. To this end, we perform a large-scale dialogue analysis of more than 150K transcripts from 16 publicly available dialogue datasets. These datasets were collected to inform different dialogue-based tasks including conversational search. We extract different patterns of mixed initiative from these dialogue transcripts and use them to compare dialogues of different types. Moreover, we contrast the patterns found in information-seeking dialogues that are being used for research purposes with the patterns found in virtual reference interviews that were conducted by professional librarians. The insights we provide (1) establish close relations between conversational search and other conversational AI tasks; and (2) uncover limitations of existing conversational datasets to inform future data collection tasks.

rate research

An Analysis of Mixed Initiative and Collaboration in Information-Seeking Dialogues

195 - Svitlana Vakulenko , Evangelos Kanoulas , Maarten de Rijke 2020

The ability to engage in mixed-initiative interaction is one of the core requirements for a conversational search system. How to achieve this is poorly understood. We propose a set of unsupervised metrics, termed ConversationShape, that highlights the role each of the conversation participants plays by comparing the distribution of vocabulary and utterance types. Using ConversationShape as a lens, we take a closer look at several conversational search datasets and compare them with other dialogue datasets to better understand the types of dialogue interaction they represent, either driven by the information seeker or the assistant. We discover that deviations from the ConversationShape of a human-human dialogue of the same type is predictive of the quality of a human-machine dialogue.

Information Retrieval

Information Seeking in the Spirit of Learning: a Dataset for Conversational Curiosity

110 - Pedro Rodriguez , Paul Crook , Seungwhan Moon 2020

Open-ended human learning and information-seeking are increasingly mediated by digital assistants. However, such systems often ignore the users pre-existing knowledge. Assuming a correlation between engagement and user responses such as liking messages or asking followup questions, we design a Wizard-of-Oz dialog task that tests the hypothesis that engagement increases when users are presented with facts related to what they know. Through crowd-sourcing of this experiment, we collect and release 14K dialogs (181K utterances) where users and assistants converse about geographic topics like geopolitical entities and locations. This dataset is annotated with pre-existing user knowledge, message-level dialog acts, grounding to Wikipedia, and user reactions to messages. Responses using a users prior knowledge increase engagement. We incorporate this knowledge into a multi-task model that reproduces human assistant policies and improves over a BERT content model by 13 mean reciprocal rank points.

Computation and Language

WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset

45 - Jibril Frej , Didier Schwab , Jean-Pierre Chevallet 2019

Over the past years, deep learning methods allowed for new state-of-the-art results in ad-hoc information retrieval. However such methods usually require large amounts of annotated data to be effective. Since most standard ad-hoc information retrieval datasets publicly available for academic research (e.g. Robust04, ClueWeb09) have at most 250 annotated queries, the recent deep learning models for information retrieval perform poorly on these datasets. These models (e.g. DUET, Conv-KNRM) are trained and evaluated on data collected from commercial search engines not publicly available for academic research which is a problem for reproducibility and the advancement of research. In this paper, we propose WIKIR: an open-source toolkit to automatically build large-scale English information retrieval datasets based on Wikipedia. WIKIR is publicly available on GitHub. We also provide wikIR78k and wikIRS78k: two large-scale publicly available datasets that both contain 78,628 queries and 3,060,191 (query, relevant documents) pairs.

Information Retrieval

Meta-evaluation of Conversational Search Evaluation Metrics

99 - Zeyang Liu , Ke Zhou , Max L. Wilson 2021

Conversational search systems, such as Google Assistant and Microsoft Cortana, enable users to interact with search systems in multiple rounds through natural language dialogues. Evaluating such systems is very challenging given that any natural language responses could be generated, and users commonly interact for multiple semantically coherent rounds to accomplish a search task. Although prior studies proposed many evaluation metrics, the extent of how those measures effectively capture user preference remains to be investigated. In this paper, we systematically meta-evaluate a variety of conversational search metrics. We specifically study three perspectives on those metrics: (1) reliability: the ability to detect actual performance differences as opposed to those observed by chance; (2) fidelity: the ability to agree with ultimate user preference; and (3) intuitiveness: the ability to capture any property deemed important: adequacy, informativeness, and fluency in the context of conversational search. By conducting experiments on two test collections, we find that the performance of different metrics varies significantly across different scenarios whereas consistent with prior studies, existing metrics only achieve a weak correlation with ultimate user preference and satisfaction. METEOR is, comparatively speaking, the best existing single-turn metric considering all three perspectives. We also demonstrate that adapted session-based evaluation metrics can be used to measure multi-turn conversational search, achieving moderate concordance with user satisfaction. To our knowledge, our work establishes the most comprehensive meta-evaluation for conversational search to date.

Information Retrieval Artificial Intelligence Machine Learning

Mixed-Initiative Interaction = Mixed Computation

90 - Naren Ramakrishnan , Robert Capra , 2001

We show that partial evaluation can be usefully viewed as a programming model for realizing mixed-initiative functionality in interactive applications. Mixed-initiative interaction between two participants is one where the parties can take turns at any time to change and steer the flow of interaction. We concentrate on the facet of mixed-initiative referred to as `unsolicited reporting and demonstrate how out-of-turn interactions by users can be modeled by `jumping ahead to nested dialogs (via partial evaluation). Our approach permits the view of dialog management systems in terms of their native support for staging and simplifying interactions; we characterize three different voice-based interaction technologies using this viewpoint. In particular, we show that the built-in form interpretation algorithm (FIA) in the VoiceXML dialog management architecture is actually a (well disguised) combination of an interpreter and a partial evaluator.

Programming Languages Human-Computer Interaction