No Arabic abstract
We extend balloon and sample-smoothing estimators, two types of variable-bandwidth kernel density estimators, by a shift parameter and derive their asymptotic properties. Our approach facilitates the unified study of a wide range of density estimators which are subsumed under these two general classes of kernel density estimators. We demonstrate our method by deriving the asymptotic bias, variance, and mean (integrated) squared error of density estimators with gamma, log-normal, Birnbaum-Saunders, inverse Gaussian and reciprocal inverse Gaussian kernels. We propose two new density estimators for positive random variables that yield properly-normalised density estimates. Plugin expressions for bandwidth estimation are provided to facilitate easy exploratory data analysis.
In observational studies, balancing covariates in different treatment groups is essential to estimate treatment effects. One of the most commonly used methods for such purposes is weighting. The performance of this class of methods usually depends on strong regularity conditions for the underlying model, which might not hold in practice. In this paper, we investigate weighting methods from a functional estimation perspective and argue that the weights needed for covariate balancing could differ from those needed for treatment effects estimation under low regularity conditions. Motivated by this observation, we introduce a new framework of weighting that directly targets the treatment effects estimation. Unlike existing methods, the resulting estimator for a treatment effect under this new framework is a simple kernel-based $U$-statistic after applying a data-driven transformation to the observed covariates. We characterize the theoretical properties of the new estimators of treatment effects under a nonparametric setting and show that they are able to work robustly under low regularity conditions. The new framework is also applied to several numerical examples to demonstrate its practical merits.
Randomization (a.k.a. permutation) inference is typically interpreted as testing Fishers sharp null hypothesis that all effects are exactly zero. This hypothesis is often criticized as uninteresting and implausible. We show, however, that many randomization tests are also valid for a bounded null hypothesis under which effects are all negative (or positive) for all units but otherwise heterogeneous. The bounded null is closely related to important concepts such as monotonicity and Pareto efficiency. Inverting tests of this hypothesis yields confidence intervals for the maximum (or minimum) individual treatment effect. We then extend randomization tests to infer other quantiles of individual effects, which can be used to infer the proportion of units with effects larger (or smaller) than any threshold. The proposed confidence intervals for all quantiles of individual effects are simultaneously valid, in the sense that no correction due to multiple analyses is needed. In sum, we provide a broader justification for Fisher randomization tests, and develop exact nonparametric inference for quantiles of heterogeneous individual effects. We illustrate our methods with simulations and applications, where we find that Stephenson rank statistics often provide the most informative results.
The Youden index is a popular summary statistic for receiver operating characteristic curve. It gives the optimal cutoff point of a biomarker to distinguish the diseased and healthy individuals. In this paper, we propose to model the distributions of a biomarker for individuals in the healthy and diseased groups via a semiparametric density ratio model. Based on this model, we use the maximum empirical likelihood method to estimate the Youden index and the optimal cutoff point. We further establish the asymptotic normality of the proposed estimators and construct valid confidence intervals for the Youden index and the corresponding optimal cutoff point. The proposed method automatically covers both cases when there is no lower limit of detection (LLOD) and when there is a fixed and finite LLOD for the biomarker. Extensive simulation studies and a real data example are used to illustrate the effectiveness of the proposed method and its advantages over the existing methods.
We develop a unified approach to hypothesis testing for various types of widely used functional linear models, such as scalar-on-function, function-on-function and function-on-scalar models. In addition, the proposed test applies to models of mixed types, such as models with both functional and scalar predictors. In contrast with most existing methods that rest on the large-sample distributions of test statistics, the proposed method leverages the technique of bootstrapping max statistics and exploits the variance decay property that is an inherent feature of functional data, to improve the empirical power of tests especially when the sample size is limited and the signal is relatively weak. Theoretical guarantees on the validity and consistency of the proposed test are provided uniformly for a class of test statistics.
The strata-specific treatment effect or so-called blip for a randomly drawn strata of confounders defines a random variable and a corresponding cumulative distribution function. However, the CDF is not pathwise differentiable, necessitating a kernel smoothing approach to estimate it at a given point or perhaps many points. Assuming the CDF is continuous, we derive the efficient influence curve of the kernel smoothed version of the blip CDF and a CV-TMLE estimator. The estimator is asymptotically efficient under two conditions, one of which involves a second order remainder term which, in this case, shows us that knowledge of the treatment mechanism does not guarantee a consistent estimate. The remainder term also teaches us exactly how well we need to estimate the nuisance parameters (outcome model and treatment mechanism) to guarantee asymptotic efficiency. Through simulations we verify theoretical properties of the estimator and show the importance of machine learning over conventional regression approaches to fitting the nuisance parameters. We also derive the bias and variance of the estimator, the orders of which are analogous to a kernel density estimator. This estimator opens up the possibility of developing methodology for optimal choice of the kernel and bandwidth to form confidence bounds for the CDF itself.