No Arabic abstract
Truncated conditional expectation functions are objects of interest in a wide range of economic applications, including income inequality measurement, financial risk management, and impact evaluation. They typically involve truncating the outcome variable above or below certain quantiles of its conditional distribution. In this paper, based on local linear methods, a novel, two-stage, nonparametric estimator of such functions is proposed. In this estimation problem, the conditional quantile function is a nuisance parameter that has to be estimated in the first stage. The proposed estimator is insensitive to the first-stage estimation error owing to the use of a Neyman-orthogonal moment in the second stage. This construction ensures that inference methods developed for the standard nonparametric regression can be readily adapted to conduct inference on truncated conditional expectations. As an extension, estimation with an estimated truncation quantile level is considered. The proposed estimator is applied in two empirical settings: sharp regression discontinuity designs with a manipulated running variable and randomized experiments with sample selection.
Conditional density estimation generalizes regression by modeling a full density f(yjx) rather than only the expected value E(yjx). This is important for many tasks, including handling multi-modality and generating prediction intervals. Though fundamental and widely applicable, nonparametric conditional density estimators have received relatively little attention from statisticians and little or none from the machine learning community. None of that work has been applied to greater than bivariate data, presumably due to the computational difficulty of data-driven bandwidth selection. We describe the double kernel conditional density estimator and derive fast dual-tree-based algorithms for bandwidth selection using a maximum likelihood criterion. These techniques give speedups of up to 3.8 million in our experiments, and enable the first applications to previously intractable large multivariate datasets, including a redshift prediction problem from the Sloan Digital Sky Survey.
Given the unconfoundedness assumption, we propose new nonparametric estimators for the reduced dimensional conditional average treatment effect (CATE) function. In the first stage, the nuisance functions necessary for identifying CATE are estimated by machine learning methods, allowing the number of covariates to be comparable to or larger than the sample size. The second stage consists of a low-dimensional local linear regression, reducing CATE to a function of the covariate(s) of interest. We consider two variants of the estimator depending on whether the nuisance functions are estimated over the full sample or over a hold-out sample. Building on Belloni at al. (2017) and Chernozhukov et al. (2018), we derive functional limit theory for the estimators and provide an easy-to-implement procedure for uniform inference based on the multiplier bootstrap. The empirical application revisits the effect of maternal smoking on a babys birth weight as a function of the mothers age.
Instrumental variables (IV) regression is a popular method for the estimation of the endogenous treatment effects. Conventional IV methods require all the instruments are relevant and valid. However, this is impractical especially in high-dimensional models when we consider a large set of candidate IVs. In this paper, we propose an IV estimator robust to the existence of both the invalid and irrelevant instruments (called R2IVE) for the estimation of endogenous treatment effects. This paper extends the scope of Kang et al. (2016) by considering a true high-dimensional IV model and a nonparametric reduced form equation. It is shown that our procedure can select the relevant and valid instruments consistently and the proposed R2IVE is root-n consistent and asymptotically normal. Monte Carlo simulations demonstrate that the R2IVE performs favorably compared to the existing high-dimensional IV estimators (such as, NAIVE (Fan and Zhong, 2018) and sisVIVE (Kang et al., 2016)) when invalid instruments exist. In the empirical study, we revisit the classic question of trade and growth (Frankel and Romer, 1999).
We provide a novel inferential framework to estimate the exact affine Stone index (EASI) model, and analyze welfare implications due to price changes caused by taxes. Our inferential framework is based on a non-parametric specification of the stochastic errors in the EASI incomplete demand system using Dirichlet processes. Our proposal enables to identify consumer clusters due to unobserved preference heterogeneity taking into account, censoring, simultaneous endogeneity and non-linearities. We perform an application based on a tax on electricity consumption in the Colombian economy. Our results suggest that there are four clusters due to unobserved preference heterogeneity; although 95% of our sample belongs to one cluster. This suggests that observable variables describe preferences in a good way under the EASI model in our application. We find that utilities seem to be inelastic normal goods with non-linear Engel curves. Joint predictive distributions indicate that electricity tax generates substitution effects between electricity and other non-utility goods. These distributions as well as Slutsky matrices suggest good model assessment. We find that there is a 95% probability that the equivalent variation as percentage of income of the representative household is between 0.60% to 1.49% given an approximately 1% electricity tariff increase. However, there are heterogeneous effects with higher socioeconomic strata facing more welfare losses on average. This highlights the potential remarkable welfare implications due taxation on inelastic services.
In this paper, a nonparametric maximum likelihood (ML) estimator for band-limited (BL) probability density functions (pdfs) is proposed. The BLML estimator is consistent and computationally efficient. To compute the BLML estimator, three approximate algorithms are presented: a binary quadratic programming (BQP) algorithm for medium scale problems, a Trivial algorithm for large-scale problems that yields a consistent estimate if the underlying pdf is strictly positive and BL, and a fast implementation of the Trivial algorithm that exploits the band-limited assumption and the Nyquist sampling theorem (BLMLQuick). All three BLML estimators outperform kernel density estimation (KDE) algorithms (adaptive and higher order KDEs) with respect to the mean integrated squared error for data generated from both BL and infinite-band pdfs. Further, the BLMLQuick estimate is remarkably faster than the KD algorithms. Finally, the BLML method is applied to estimate the conditional intensity function of a neuronal spike train (point process) recorded from a rats entorhinal cortex grid cell, for which it outperforms state-of-the-art estimators used in neuroscience.