External Validity: From Do-Calculus to Transportability Across Populations

269 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Judea Pearl

تاريخ النشر 2015

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Judea Pearl - Elias Bareinboim

المنهجية الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The generalizability of empirical findings to new environments, settings or populations, often called external validity, is essential in most scientific explorations. This paper treats a particular problem of generalizability, called transportability, defined as a license to transfer causal effects learned in experimental studies to a new population, in which only observational studies can be conducted. We introduce a formal representation called selection diagrams for expressing knowledge about differences and commonalities between populations of interest and, using this representation, we reduce questions of transportability to symbolic derivations in the do-calculus. This reduction yields graph-based procedures for deciding, prior to observing any data, whether causal effects in the target population can be inferred from experimental findings in the study population. When the answer is affirmative, the procedures identify what experimental and observational findings need be obtained from the two populations, and how they can be combined to ensure bias-free transport.

قيم البحث

77 - Sudipto Mukherjee , Subhabrata Mukherjee , Marcello Hasegawa 2020

Intelligent features in email service applications aim to increase productivity by helping people organize their folders, compose their emails and respond to pending tasks. In this work, we explore a new application, Smart-To-Do, that helps users wit h task management over emails. We introduce a new task and dataset for automatically generating To-Do items from emails where the sender has promised to perform an action. We design a two-stage process leveraging recent advances in neural text generation and sequence-to-sequence learning, obtaining BLEU and ROUGE scores of 0:23 and 0:63 for this task. To the best of our knowledge, this is the first work to address the problem of composing To-Do items from emails.

الحساب واللغة الذكاء الاصطناعي التعلم الآلي

What to do if N is two?

147 - Pascal Fries Ernst Strungmannn Institute for Neuroscience in Cooperation with Max Planck Society 2021

The field of in-vivo neurophysiology currently uses statistical standards that are based on tradition rather than formal analysis. Typically, data from two (or few) animals are pooled for one statistical test, or a significant test in a first animal is replicated in one (or few) further animals. The use of more than one animal is widely believed to allow an inference on the population. Here, we explain that a useful inference on the population would require larger numbers and a different statistical approach. The field should consider to perform studies at that standard, potentially through coordinated multi-center efforts, for selected questions of exceptional importance. Yet, for many questions, this is ethically and/or economically not justifiable. We explain why in those studies with two (or few) animals, any useful inference is limited to the sample of investigated animals, irrespective of whether it is based on few animals, two animals or a single animal.

المنهجية تطبيقات الإحصاء

A Simulation Study of Bandit Algorithms to Address External Validity of Software Fault Prediction

126 - Teruki Hayakawa , Masateru Tsunoda , Koji Toda 2020

Various software fault prediction models and techniques for building algorithms have been proposed. Many studies have compared and evaluated them to identify the most effective ones. However, in most cases, such models and techniques do not have the best performance on every dataset. This is because there is diversity of software development datasets, and therefore, there is a risk that the selected model or technique shows bad performance on a certain dataset. To avoid selecting a low accuracy model, we apply bandit algorithms to predict faults. Consider a case where player has 100 coins to bet on several slot machines. Ordinary usage of software fault prediction is analogous to the player betting all 100 coins in one slot machine. In contrast, bandit algorithms bet one coin on each machine (i.e., use prediction models) step-by-step to seek the best machine. In the experiment, we developed an artificial dataset that includes 100 modules, 15 of which include faults. Then, we developed various artificial fault prediction models and selected them dynamically using bandit algorithms. The Thomson sampling algorithm showed the best or second-best prediction performance compared with using only one prediction model.

هندسة البرمجيات التعلم الآلي

Learning to Generalize Across Long-Horizon Tasks from Human Demonstrations

167 - Ajay Mandlekar , Danfei Xu , Roberto Martin-Martin 2020

Imitation learning is an effective and safe technique to train robot policies in the real world because it does not depend on an expensive random exploration process. However, due to the lack of exploration, learning policies that generalize beyond t he demonstrated behaviors is still an open challenge. We present a novel imitation learning framework to enable robots to 1) learn complex real world manipulation tasks efficiently from a small number of human demonstrations, and 2) synthesize new behaviors not contained in the collected demonstrations. Our key insight is that multi-task domains often present a latent structure, where demonstrated trajectories for different tasks intersect at common regions of the state space. We present Generalization Through Imitation (GTI), a two-stage offline imitation learning algorithm that exploits this intersecting structure to train goal-directed policies that generalize to unseen start and goal state combinations. In the first stage of GTI, we train a stochastic policy that leverages trajectory intersections to have the capacity to compose behaviors from different demonstration trajectories together. In the second stage of GTI, we collect a small set of rollouts from the unconditioned stochastic policy of the first stage, and train a goal-directed agent to generalize to novel start and goal configurations. We validate GTI in both simulated domains and a challenging long-horizon robotic manipulation domain in the real world. Additional results and videos are available at https://sites.google.com/view/gti2020/ .

علم الروبوتات الذكاء الاصطناعي التعلم الآلي

Do we need to estimate the variance in robust mean estimation?

159 - Qiang Sun 2021

This paper studies robust mean estimators for distributions with only finite variances. We propose a new loss function that is a function of the mean parameter and a robustification parameter. By simultaneously optimizing the empirical loss with resp ect to both parameters, we show that the resulting estimator for the robustification parameter can automatically adapt to the data and the unknown variance. Thus the resulting mean estimator can achieve near-optimal finite-sample performance. Compared with prior work, our method is computationally efficient and user-friendly. It does not need cross-validation to tune the robustification parameter.

المنهجية نظرية الإحصاء نظرية الإحصاء