Reproducing Kernel Methods for Nonparametric and Semiparametric Treatment Effects

440 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Rahul Singh

تاريخ النشر 2020

مجال البحث اقتصاد الهندسة المعلوماتية

والبحث باللغة English

تأليف Rahul Singh - Liyuan Xu - Arthur Gretton

الاقتصاد القياسي التعلم الآلي نظرية الإحصاء

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We propose a family of reproducing kernel ridge estimators for nonparametric and semiparametric policy evaluation. The framework includes (i) treatment effects of the population, of subpopulations, and of alternative populations; (ii) the decomposition of a total effect into a direct effect and an indirect effect (mediated by a particular mechanism); and (iii) effects of sequences of treatments. Treatment and covariates may be discrete or continuous, and low, high, or infinite dimensional. We consider estimation of means, increments, and distributions of counterfactual outcomes. Each estimator is an inner product in a reproducing kernel Hilbert space (RKHS), with a one line, closed form solution. For the nonparametric case, we prove uniform consistency and provide finite sample rates of convergence. For the semiparametric case, we prove root n consistency, Gaussian approximation, and semiparametric efficiency by finite sample arguments. We evaluate our estimators in simulations then estimate continuous, heterogeneous, incremental, and mediated treatment effects of the US Jobs Corps training program for disadvantaged youth.

قيم البحث

81 - Yuya Sasaki , Takuya Ura 2018

The policy relevant treatment effect (PRTE) measures the average effect of switching from a status-quo policy to a counterfactual policy. Estimation of the PRTE involves estimation of multiple preliminary parameters, including propensity scores, cond itional expectation functions of the outcome and covariates given the propensity score, and marginal treatment effects. These preliminary estimators can affect the asymptotic distribution of the PRTE estimator in complicated and intractable manners. In this light, we propose an orthogonal score for double debiased estimation of the PRTE, whereby the asymptotic distribution of the PRTE estimator is obtained without any influence of preliminary parameter estimators as far as they satisfy mild requirements of convergence rates. To our knowledge, this paper is the first to develop limit distribution theories for inference about the PRTE.

الاقتصاد القياسي

When Do Neural Networks Outperform Kernel Methods?

83 - Behrooz Ghorbani , Song Mei , Theodor Misiakiewicz 2020

For a certain scaling of the initialization of stochastic gradient descent (SGD), wide neural networks (NN) have been shown to be well approximated by reproducing kernel Hilbert space (RKHS) methods. Recent empirical work showed that, for some classi fication tasks, RKHS methods can replace NNs without a large loss in performance. On the other hand, two-layers NNs are known to encode richer smoothness classes than RKHS and we know of special examples for which SGD-trained NN provably outperform RKHS. This is true even in the wide network limit, for a different scaling of the initialization. How can we reconcile the above claims? For which tasks do NNs outperform RKHS? If feature vectors are nearly isotropic, RKHS methods suffer from the curse of dimensionality, while NNs can overcome it by learning the best low-dimensional representation. Here we show that this curse of dimensionality becomes milder if the feature vectors display the same low-dimensional structure as the target function, and we precisely characterize this tradeoff. Building on these results, we present a model that can capture in a unified framework both behaviors observed in earlier work. We hypothesize that such a latent low-dimensional structure is present in image classification. We test numerically this hypothesis by showing that specific perturbations of the training distribution degrade the performances of RKHS methods much more significantly than NNs.

التعلم الالي التعلم الآلي نظرية الإحصاء

An Online Projection Estimator for Nonparametric Regression in Reproducing Kernel Hilbert Spaces

231 - Tianyu Zhang , Noah Simon 2021

The goal of nonparametric regression is to recover an underlying regression function from noisy observations, under the assumption that the regression function belongs to a pre-specified infinite dimensional function space. In the online setting, whe n the observations come in a stream, it is generally computationally infeasible to refit the whole model repeatedly. There are as of yet no methods that are both computationally efficient and statistically rate-optimal. In this paper, we propose an estimator for online nonparametric regression. Notably, our estimator is an empirical risk minimizer (ERM) in a deterministic linear space, which is quite different from existing methods using random features and functional stochastic gradient. Our theoretical analysis shows that this estimator obtains rate-optimal generalization error when the regression function is known to live in a reproducing kernel Hilbert space. We also show, theoretically and empirically, that the computational expense of our estimator is much lower than other rate-optimal estimators proposed for this online setting.

المنهجية

Testing for Unobserved Heterogeneous Treatment Effects with Observational Data

83 - Yu-Chin Hsu , Ta-Cheng Huang , 2018

Unobserved heterogeneous treatment effects have been emphasized in the recent policy evaluation literature (see e.g., Heckman and Vytlacil, 2005). This paper proposes a nonparametric test for unobserved heterogeneous treatment effects in a treatment effect model with a binary treatment assignment, allowing for individuals self-selection to the treatment. Under the standard local average treatment effects assumptions, i.e., the no defiers condition, we derive testable model restrictions for the hypothesis of unobserved heterogeneous treatment effects. Also, we show that if the treatment outcomes satisfy a monotonicity assumption, these model restrictions are also sufficient. Then, we propose a modified Kolmogorov-Smirnov-type test which is consistent and simple to implement. Monte Carlo simulations show that our test performs well in finite samples. For illustration, we apply our test to study heterogeneous treatment effects of the Job Training Partnership Act on earnings and the impacts of fertility on family income, where the null hypothesis of homogeneous treatment effects gets rejected in the second case but fails to be rejected in the first application.

الاقتصاد القياسي

Welfare Analysis via Marginal Treatment Effects

122 - Yuya Sasaki , Takuya Ura 2020

Consider a causal structure with endogeneity (i.e., unobserved confoundedness) in empirical data, where an instrumental variable is available. In this setting, we show that the mean social welfare function can be identified and represented via the ma rginal treatment effect (MTE, Bjorklund and Moffitt, 1987) as the operator kernel. This representation result can be applied to a variety of statistical decision rules for treatment choice, including plug-in rules, Bayes rules, and empirical welfare maximization (EWM) rules as in Hirano and Porter (2020, Section 2.3). Focusing on the application to the EWM framework of Kitagawa and Tetenov (2018), we provide convergence rates of the worst case average welfare loss (regret) in the spirit of Manski (2004).

الاقتصاد القياسي