ترغب بنشر مسار تعليمي؟ اضغط هنا

Estimating Average Treatment Effects with Support Vector Machines

81   0   0.0 ( 0 )
 نشر من قبل Alexander Tarr
 تاريخ النشر 2021
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

Support vector machine (SVM) is one of the most popular classification algorithms in the machine learning literature. We demonstrate that SVM can be used to balance covariates and estimate average causal effects under the unconfoundedness assumption. Specifically, we adapt the SVM classifier as a kernel-based weighting procedure that minimizes the maximum mean discrepancy between the treatment and control groups while simultaneously maximizing effective sample size. We also show that SVM is a continuous relaxation of the quadratic integer program for computing the largest balanced subset, establishing its direct relation to the cardinality matching method. Another important feature of SVM is that the regularization parameter controls the trade-off between covariate balance and effective sample size. As a result, the existing SVM path algorithm can be used to compute the balance-sample size frontier. We characterize the bias of causal effect estimation arising from this trade-off, connecting the proposed SVM procedure to the existing kernel balancing methods. Finally, we conduct simulation and empirical studies to evaluate the performance of the proposed methodology and find that SVM is competitive with the state-of-the-art covariate balancing methods.

قيم البحث

اقرأ أيضاً

We focus on the problem of generalizing a causal effect estimated on a randomized controlled trial (RCT) to a target population described by a set of covariates from observational data. Available methods such as inverse propensity weighting are not d esigned to handle missing values, which are however common in both data sources. In addition to coupling the assumptions for causal effect identifiability and for the missing values mechanism and to defining appropriate estimation strategies, one difficulty is to consider the specific structure of the data with two sources and treatment and outcome only available in the RCT. We propose and compare three multiple imputation strategies (separate imputation, joint imputation with fixed effect, joint imputation without source information), as well as a technique that uses estimators that can handle missing values directly without imputing them. These methods are assessed in an extensive simulation study, showing the empirical superiority of fixed effect multiple imputation followed with any complete data generalizing estimators. This work is motivated by the analysis of a large registry of over 20,000 major trauma patients and a RCT studying the effect of tranexamic acid administration on mortality. The analysis illustrates how the missing values handling can impact the conclusion about the effect generalized from the RCT to the target population.
Forest-based methods have recently gained in popularity for non-parametric treatment effect estimation. Building on this line of work, we introduce causal survival forests, which can be used to estimate heterogeneous treatment effects in a survival a nd observational setting where outcomes may be right-censored. Our approach relies on orthogonal estimating equations to robustly adjust for both censoring and selection effects. In our experiments, we find our approach to perform well relative to a number of baselines.
The Cox regression model and its associated hazard ratio (HR) are frequently used for summarizing the effect of treatments on time to event outcomes. However, the HRs interpretation strongly depends on the assumed underlying survival model. The chall enge of interpreting the HR has been the focus of a number of recent works. Besides, several alternative measures have been proposed in order to deal with these concerns. The marginal Cox regression models include an identifiable hazard ratio without individual but populational causal interpretation. In this work, we study the properties of one particular marginal Cox regression model and consider its estimation in the presence of omitted confounder. We prove the large sample consistency of an estimation score which allows non-binary treatments. Our Monte Carlo simulations suggest that finite sample behavior of the procedure is adequate. The studied estimator is more robust than its competitors for weak instruments although it is slightly more biased for large effects of the treatment. The practical use of the presented techniques is illustrated through a real practical example using data from the vascular quality initiative registry. The used R code is provided as Supplementary Material.
142 - Shuo Sun , Erica E. M. Moodie , 2021
Analyses of environmental phenomena often are concerned with understanding unlikely events such as floods, heatwaves, droughts or high concentrations of pollutants. Yet the majority of the causal inference literature has focused on modelling means, r ather than (possibly high) quantiles. We define a general estimator of the population quantile treatment (or exposure) effects (QTE) -- the weighted QTE (WQTE) -- of which the population QTE is a special case, along with a general class of balancing weights incorporating the propensity score. Asymptotic properties of the proposed WQTE estimators are derived. We further propose and compare propensity score regression and two weighted methods based on these balancing weights to understand the causal effect of an exposure on quantiles, allowing for the exposure to be binary, discrete or continuous. Finite sample behavior of the three estimators is studied in simulation. The proposed methods are applied to data taken from the Bavarian Danube catchment area to estimate the 95% QTE of phosphorus on copper concentration in the river.
A widely-used tool for binary classification is the Support Vector Machine (SVM), a supervised learning technique that finds the maximum margin linear separator between the two classes. While SVMs have been well studied in the batch (offline) setting , there is considerably less work on the streaming (online) setting, which requires only a single pass over the data using sub-linear space. Existing streaming algorithms are not yet competitive with the batch implementation. In this paper, we use the formulation of the SVM as a minimum enclosing ball (MEB) problem to provide a streaming SVM algorithm based off of the blurred ball cover originally proposed by Agarwal and Sharathkumar. Our implementation consistently outperforms existing streaming SVM approaches and provides higher accuracies than libSVM on several datasets, thus making it competitive with the standard SVM batch implementation.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا