On Inductive Biases for Heterogeneous Treatment Effect Estimation

71 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Alicia Curth

تاريخ النشر 2021

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Alicia Curth - Mihaela van der Schaar

التعلم الالي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We investigate how to exploit structural similarities of an individuals potential outcomes (POs) under different treatments to obtain better estimates of conditional average treatment effects in finite samples. Especially when it is unknown whether a treatment has an effect at all, it is natural to hypothesize that the POs are similar - yet, some existing strategies for treatment effect estimation employ regularization schemes that implicitly encourage heterogeneity even when it does not exist and fail to fully make use of shared structure. In this paper, we investigate and compare three end-to-end learning strategies to overcome this problem - based on regularization, reparametrization and a flexible multi-task architecture - each encoding inductive bias favoring shared behavior across POs. To build understanding of their relative strengths, we implement all strategies using neural networks and conduct a wide range of semi-synthetic experiments. We observe that all three approaches can lead to substantial improvements upon numerous baselines and gain insight into performance differences across various experimental settings.

قيم البحث

107 - Xiaoqing Tan , Chung-Chou H. Chang , Lu Tang 2021

Federated learning is an appealing framework for analyzing sensitive data from distributed health data networks. Under this framework, data partners at local sites collaboratively build an analytical model under the orchestration of a coordinating si te, while keeping the data decentralized. While integrating information from multiple sources may boost statistical efficiency, existing federated learning methods mainly assume data across sites are homogeneous samples of the global population, failing to properly account for the extra variability across sites in estimation and inference. Drawing on a multi-hospital electronic health records network, we develop an efficient and interpretable tree-based ensemble of personalized treatment effect estimators to join results across hospital sites, while actively modeling for the heterogeneity in data sources through site partitioning. The efficiency of this approach is demonstrated by a study of causal effects of oxygen saturation on hospital mortality and backed up by comprehensive numerical results.

التعلم الالي التعلم الآلي المنهجية

Adaptive Experimental Design for Efficient Treatment Effect Estimation

64 - Masahiro Kato , Takuya Ishihara , Junya Honda 2020

The goal of many scientific experiments including A/B testing is to estimate the average treatment effect (ATE), which is defined as the difference between the expected outcomes of two or more treatments. In this paper, we consider a situation where an experimenter can assign a treatment to research subjects sequentially. In adaptive experimental design, the experimenter is allowed to change the probability of assigning a treatment using past observations for estimating the ATE efficiently. However, with this approach, it is difficult to apply a standard statistical method to construct an estimator because the observations are not independent and identically distributed. We thus propose an algorithm for efficient experiments with estimators constructed from dependent samples. We also introduce a sequential testing framework using the proposed estimator. To justify our proposed approach, we provide finite and infinite sample analyses. Finally, we experimentally show that the proposed algorithm exhibits preferable performance.

التعلم الالي التعلم الآلي الاقتصاد القياسي

Estimation of Local Average Treatment Effect by Data Combination

87 - Kazuhiko Shinoda , Takahiro Hoshino 2021

It is important to estimate the local average treatment effect (LATE) when compliance with a treatment assignment is incomplete. The previously proposed methods for LATE estimation required all relevant variables to be jointly observed in a single da taset; however, it is sometimes difficult or even impossible to collect such data in many real-world problems for technical or privacy reasons. We consider a novel problem setting in which LATE, as a function of covariates, is nonparametrically identified from the combination of separately observed datasets. For estimation, we show that the direct least squares method, which was originally developed for estimating the average treatment effect under complete compliance, is applicable to our setting. However, model selection and hyperparameter tuning for the direct least squares estimator can be unstable in practice since it is defined as a solution to the minimax problem. We then propose a weighted least squares estimator that enables simpler model selection by avoiding the minimax objective formulation. Unlike the inverse probability weighted (IPW) estimator, the proposed estimator directly uses the pre-estimated weight without inversion, avoiding the problems caused by the IPW methods. We demonstrate the effectiveness of our method through experiments using synthetic and real-world datasets.

التعلم الالي التعلم الآلي

On the Bias Against Inductive Biases

353 - George Cazenavette , Simon Lucey 2021

Borrowing from the transformer models that revolutionized the field of natural language processing, self-supervised feature learning for visual tasks has also seen state-of-the-art success using these extremely deep, isotropic networks. However, the typical AI researcher does not have the resources to evaluate, let alone train, a model with several billion parameters and quadratic self-attention activations. To facilitate further research, it is necessary to understand the features of these huge transformer models that can be adequately studied by the typical researcher. One interesting characteristic of these transformer models is that they remove most of the inductive biases present in classical convolutional networks. In this work, we analyze the effect of these and more inductive biases on small to moderately-sized isotropic networks used for unsupervised visual feature learning and show that their removal is not always ideal.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Estimation and Inference on Heterogeneous Treatment Effects in High-Dimensional Dynamic Panels

143 - Vira Semenova , Matt Goldman , Victor Chernozhukov 2017

This paper provides estimation and inference methods for a large number of heterogeneous treatment effects in the presence of an even larger number of controls and unobserved unit heterogeneity. In our main example, the vector of heterogeneous treatm ents is generated by interacting the base treatment variable with a subset of controls. We first estimate the unit-specific expectation functions of the outcome and each treatment interaction conditional on controls and take the residuals. Second, we report the Lasso (L1-regularized least squares) estimate of the heterogeneous treatment effect parameter, regressing the outcome residual on the vector of treatment ones. We debias the Lasso estimator to conduct simultaneous inference on the target parameter by Gaussian bootstrap. We account for the unobserved unit heterogeneity by projecting it onto the time-invariant covariates, following the correlated random effects approach of Mundlak (1978) and Chamberlain (1982). We demonstrate our method by estimating price elasticities of groceries based on scanner data.

التعلم الالي