No Arabic abstract
The strata-specific treatment effect or so-called blip for a randomly drawn strata of confounders defines a random variable and a corresponding cumulative distribution function. However, the CDF is not pathwise differentiable, necessitating a kernel smoothing approach to estimate it at a given point or perhaps many points. Assuming the CDF is continuous, we derive the efficient influence curve of the kernel smoothed version of the blip CDF and a CV-TMLE estimator. The estimator is asymptotically efficient under two conditions, one of which involves a second order remainder term which, in this case, shows us that knowledge of the treatment mechanism does not guarantee a consistent estimate. The remainder term also teaches us exactly how well we need to estimate the nuisance parameters (outcome model and treatment mechanism) to guarantee asymptotic efficiency. Through simulations we verify theoretical properties of the estimator and show the importance of machine learning over conventional regression approaches to fitting the nuisance parameters. We also derive the bias and variance of the estimator, the orders of which are analogous to a kernel density estimator. This estimator opens up the possibility of developing methodology for optimal choice of the kernel and bandwidth to form confidence bounds for the CDF itself.
We extend balloon and sample-smoothing estimators, two types of variable-bandwidth kernel density estimators, by a shift parameter and derive their asymptotic properties. Our approach facilitates the unified study of a wide range of density estimators which are subsumed under these two general classes of kernel density estimators. We demonstrate our method by deriving the asymptotic bias, variance, and mean (integrated) squared error of density estimators with gamma, log-normal, Birnbaum-Saunders, inverse Gaussian and reciprocal inverse Gaussian kernels. We propose two new density estimators for positive random variables that yield properly-normalised density estimates. Plugin expressions for bandwidth estimation are provided to facilitate easy exploratory data analysis.
We determine the expected error by smoothing the data locally. Then we optimize the shape of the kernel smoother to minimize the error. Because the optimal estimator depends on the unknown function, our scheme automatically adjusts to the unknown function. By self-consistently adjusting the kernel smoother, the total estimator adapts to the data. Goodness of fit estimators select a kernel halfwidth by minimizing a function of the halfwidth which is based on the average square residual fit error: $ASR(h)$. A penalty term is included to adjust for using the same data to estimate the function and to evaluate the mean square error. Goodness of fit estimators are relatively simple to implement, but the minimum (of the goodness of fit functional) tends to be sensitive to small perturbations. To remedy this sensitivity problem, we fit the mean square error %goodness of fit functional to a two parameter model prior to determining the optimal halfwidth. Plug-in derivative estimators estimate the second derivative of the unknown function in an initial step, and then substitute this estimate into the asymptotic formula.
We offer a non-parametric plug-in estimator for an important measure of treatment effect variability and provide minimum conditions under which the estimator is asymptotically efficient. The stratum specific treatment effect function or so-called blip function, is the average treatment effect for a randomly drawn stratum of confounders. The mean of the blip function is the average treatment effect (ATE), whereas the variance of the blip function (VTE), the main subject of this paper, measures overall clinical effect heterogeneity, perhaps providing a strong impetus to refine treatment based on the confounders. VTE is also an important measure for assessing reliability of the treatment for an individual. The CV-TMLE provides simultaneous plug-in estimates and inference for both ATE and VTE, guaranteeing asymptotic efficiency under one less condition than for TMLE. This condition is difficult to guarantee a priori, particularly when using highly adaptive machine learning that we need to employ in order to eliminate bias. Even in defiance of this condition, CV-TMLE sampling distributions maintain normality, not guaranteed for TMLE, and have a lower mean squared error than their TMLE counterparts. In addition to verifying the theoretical properties of TMLE and CV-TMLE through simulations, we point out some of the challenges in estimating VTE, which lacks double robustness and might be unavoidably biased if the true VTE is small and sample size insufficient. We will provide an application of the estimator on a data set for treatment of acute trauma patients.
We propose the adversarially robust kernel smoothing (ARKS) algorithm, combining kernel smoothing, robust optimization, and adversarial training for robust learning. Our methods are motivated by the convex analysis perspective of distributionally robust optimization based on probability metrics, such as the Wasserstein distance and the maximum mean discrepancy. We adapt the integral operator using supremal convolution in convex analysis to form a novel function majorant used for enforcing robustness. Our method is simple in form and applies to general loss functions and machine learning models. Furthermore, we report experiments with general machine learning models, such as deep neural networks, to demonstrate that ARKS performs competitively with the state-of-the-art methods based on the Wasserstein distance.
In personalised decision making, evidence is required to determine suitable actions for individuals. Such evidence can be obtained by identifying treatment effect heterogeneity in different subgroups of the population. In this paper, we design a new type of pattern, treatment effect pattern to represent and discover treatment effect heterogeneity from data for determining whether a treatment will work for an individual or not. Our purpose is to use the computational power to find the most specific and relevant conditions for individuals with respect to a treatment or an action to assist with personalised decision making. Most existing work on identifying treatment effect heterogeneity takes a top-down or partitioning based approach to search for subgroups with heterogeneous treatment effects. We propose a bottom-up generalisation algorithm to obtain the most specific patterns that fit individual circumstances the best for personalised decision making. For the generalisation, we follow a consistency driven strategy to maintain inner-group homogeneity and inter-group heterogeneity of treatment effects. We also employ graphical causal modelling technique to identify adjustment variables for reliable treatment effect pattern discovery. Our method can find the treatment effect patterns reliably as validated by the experiments. The method is faster than the two existing machine learning methods for heterogeneous treatment effect identification and it produces subgroups with higher inner-group treatment effect homogeneity.