Facilitating Knowledge Sharing from Domain Experts to Data Scientists for Building NLP Models

162 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Soya Park

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Soya Park - April Wang - Ban Kawas

تفاعل الإنسان والحاسوب التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Data scientists face a steep learning curve in understanding a new domain for which they want to build machine learning (ML) models. While input from domain experts could offer valuable help, such input is often limited, expensive, and generally not in a form readily consumable by a model development pipeline. In this paper, we propose Ziva, a framework to guide domain experts in sharing essential domain knowledge to data scientists for building NLP models. With Ziva, experts are able to distill and share their domain knowledge using domain concept extractors and five types of label justification over a representative data sample. The design of Ziva is informed by preliminary interviews with data scientists, in order to understand current practices of domain knowledge acquisition process for ML development projects. To assess our design, we run a mix-method case-study to evaluate how Ziva can facilitate interaction of domain experts and data scientists. Our results highlight that (1) domain experts are able to use Ziva to provide rich domain knowledge, while maintaining low mental load and stress levels; and (2) data scientists find Zivas output helpful for learning essential information about the domain, offering scalability of information, and lowering the burden on domain experts to share knowledge. We conclude this work by experimenting with building NLP models using the Ziva output by our case study.

قيم البحث

160 - Michael Winter , Rudiger Pryss , Thomas Probst 2021

The comprehension of business process models is crucial for enterprises. Prior research has shown that children as well as adolescents perceive and interpret graphical representations in a different manner compared to grown-ups. To evaluate this, obs ervations in the context of business process models are presented in this paper obtained from a study on visual literacy in cultural education. We demonstrate that adolescents without expertise in process model comprehension are able to correctly interpret business process models expressed in terms of BPMN 2.0. In a comprehensive study, n = 205 learners (i.e., pupils at the age of 15) needed to answer questions related to process models they were confronted with, reflecting different levels of complexity. In addition, process models were created with varying styles of element labels. Study results indicate that an abstract description (i.e., using only alphabetic letters) of process models is understood more easily compared to concrete or pseudo} descriptions. As benchmark, results are compared with the ones of modeling experts (n = 40). Amongst others, study findings suggest using abstract descriptions in order to introduce novices to process modeling notations. With the obtained insights, we highlight that process models can be properly comprehended by novices.

تفاعل الإنسان والحاسوب

How Data Scientists Work Together With Domain Experts in Scientific Collaborations: To Find The Right Answer Or To Ask The Right Question?

364 - Yaoli Mao , Dakuo Wang , Michael Muller 2019

In recent years there has been an increasing trend in which data scientists and domain experts work together to tackle complex scientific questions. However, such collaborations often face challenges. In this paper, we aim to decipher this collaborat ion complexity through a semi-structured interview study with 22 interviewees from teams of bio-medical scientists collaborating with data scientists. In the analysis, we adopt the Olsons four-dimensions framework proposed in Distance Matters to code interview transcripts. Our findings suggest that besides the glitches in the collaboration readiness, technology readiness, and coupling of work dimensions, the tensions that exist in the common ground building process influence the collaboration outcomes, and then persist in the actual collaboration process. In contrast to prior works general account of building a high level of common ground, the breakdowns of content common ground together with the strengthen of process common ground in this process is more beneficial for scientific discovery. We discuss why that is and what the design suggestions are, and conclude the paper with future directions and limitations.

أجهزة الكمبيوتر والمجتمع الذكاء الاصطناعي تفاعل الإنسان والحاسوب

Tree-Structured Semantic Encoder with Knowledge Sharing for Domain Adaptation in Natural Language Generation

127 - Bo-Hsiang Tseng , Pawe{l} Budzianowski , Yen-Chen Wu 2019

Domain adaptation in natural language generation (NLG) remains challenging because of the high complexity of input semantics across domains and limited data of a target domain. This is particularly the case for dialogue systems, where we want to be a ble to seamlessly include new domains into the conversation. Therefore, it is crucial for generation models to share knowledge across domains for the effective adaptation from one domain to another. In this study, we exploit a tree-structured semantic encoder to capture the internal structure of complex semantic representations required for multi-domain dialogues in order to facilitate knowledge sharing across domains. In addition, a layer-wise attention mechanism between the tree encoder and the decoder is adopted to further improve the models capability. The automatic evaluation results show that our model outperforms previous methods in terms of the BLEU score and the slot error rate, in particular when the adaptation data is limited. In subjective evaluation, human judges tend to prefer the sentences generated by our model, rating them more highly on informativeness and naturalness than other systems.

الحساب واللغة التعلم الآلي

An introduction to Topological Data Analysis: fundamental and practical aspects for data scientists

104 - Frederic Chazal 2017

Topological Data Analysis is a recent and fast growing field providing a set of new topological and geometric tools to infer relevant features for possibly complex data. This paper is a brief introduction, through a few selected topics, to basic fund amental and practical aspects of tda for non experts.

نظرية الإحصاء التعلم الآلي الطوبولوجيا الجبرية

Human-AI Collaboration in Data Science: Exploring Data Scientists Perceptions of Automated AI

329 - Dakuo Wang , Justin D. Weisz , Michael Muller 2019

The rapid advancement of artificial intelligence (AI) is changing our lives in many ways. One application domain is data science. New techniques in automating the creation of AI, known as AutoAI or AutoML, aim to automate the work practices of data s cientists. AutoAI systems are capable of autonomously ingesting and pre-processing data, engineering new features, and creating and scoring models based on a target objectives (e.g. accuracy or run-time efficiency). Though not yet widely adopted, we are interested in understanding how AutoAI will impact the practice of data science. We conducted interviews with 20 data scientists who work at a large, multinational technology company and practice data science in various business settings. Our goal is to understand their current work practices and how these practices might change with AutoAI. Reactions were mixed: while informants expressed concerns about the trend of automating their jobs, they also strongly felt it was inevitable. Despite these concerns, they remained optimistic about their future job security due to a view that the future of data science work will be a collaboration between humans and AI systems, in which both automation and human expertise are indispensable.

تفاعل الإنسان والحاسوب الذكاء الاصطناعي التعلم الآلي