No Arabic abstract
This paper studies the generalization of the targeted minimum loss-based estimation (TMLE) framework to estimation of effects of time-varying interventions in settings where both interventions, covariates, and outcome can happen at subject-specific time-points on an arbitrarily fine time-scale. TMLE is a general template for constructing asymptotically linear substitution estimators for smooth low-dimensional parameters in infinite-dimensional models. Existing longitudinal TMLE methods are developed for data where observations are made on a discrete time-grid. We consider a continuous-time counting process model where intensity measures track the monitoring of subjects, and focus on a low-dimensional target parameter defined as the intervention-specific mean outcome at the end of follow-up. To construct our TMLE algorithm for the given statistical estimation problem we derive an expression for the efficient influence curve and represent the target parameter as a functional of intensities and conditional expectations. The high-dimensional nuisance parameters of our model are estimated and updated in an iterative manner according to separate targeting steps for the involved intensities and conditional expectations. The resulting estimator solves the efficient influence curve equation. We state a general efficiency theorem and describe a highly adaptive lasso estimator for nuisance parameters that allows us to establish asymptotic linearity and efficiency of our estimator under minimal conditions on the underlying statistical model.
Fields like public health, public policy, and social science often want to quantify the degree of dependence between variables whose relationships take on unknown functional forms. Typically, in fact, researchers in these fields are attempting to evaluate causal theories, and so want to quantify dependence after conditioning on other variables that might explain, mediate or confound causal relations. One reason conditional mutual information is not more widely used for these tasks is the lack of estimators which can handle combinations of continuous and discrete random variables, common in applications. This paper develops a new method for estimating mutual and conditional mutual information for data samples containing a mix of discrete and continuous variables. We prove that this estimator is consistent and show, via simulation, that it is more accurate than similar estimators.
Coarse structural nested mean models are used to estimate treatment effects from longitudinal observational data. Coarse structural nested mean models lead to a large class of estimators. It turns out that estimates and standard errors may differ considerably within this class. We prove that, under additional assumptions, there exists an explicit solution for the optimal estimator within the class of coarse structural nested mean models. Moreover, we show that even if the additional assumptions do not hold, this optimal estimator is doubly-robust: it is consistent and asymptotically normal not only if the model for treatment initiation is correct, but also if a certain outcome-regression model is correct. We compare the optimal estimator to some naive choices within the class of coarse structural nested mean models in a simulation study. Furthermore, we apply the optimal and naive estimators to study how the CD4 count increase due to one year of antiretroviral treatment (ART) depends on the time between HIV infection and ART initiation in recently infected HIV infected patients. Both in the simulation study and in the application, the use of optimal estimators leads to substantial increases in precision.
This paper deals with the dimension reduction for high-dimensional time series based on common factors. In particular we allow the dimension of time series $p$ to be as large as, or even larger than, the sample size $n$. The estimation for the factor loading matrix and the factor process itself is carried out via an eigenanalysis for a $ptimes p$ non-negative definite matrix. We show that when all the factors are strong in the sense that the norm of each column in the factor loading matrix is of the order $p^{1/2}$, the estimator for the factor loading matrix, as well as the resulting estimator for the precision matrix of the original $p$-variant time series, are weakly consistent in $L_2$-norm with the convergence rates independent of $p$. This result exhibits clearly that the `curse is canceled out by the `blessings in dimensionality. We also establish the asymptotic properties of the estimation when not all factors are strong. For the latter case, a two-step estimation procedure is preferred accordingly to the asymptotic theory. The proposed methods together with their asymptotic properties are further illustrated in a simulation study. An application to a real data set is also reported.
Selective recruitment designs preferentially recruit individuals that are estimated to be statistically informative onto a clinical trial. Individuals that are expected to contribute less information have a lower probability of recruitment. Furthermore, in an information-adaptive design recruits are allocated to treatment arms in a manner that maximises information gain. The informativeness of an individual depends on their covariate (or biomarker) values and how information is defined is a critical element of information-adaptive designs. In this paper we define and evaluate four different methods for quantifying statistical information. Using both experimental data and numerical simulations we show that selective recruitment designs can offer a substantial increase in statistical power compared to randomised designs. In trials without selective recruitment we find that allocating individuals to treatment arms according to information-adaptive protocols also leads to an increase in statistical power. Consequently, selective recruitment designs can potentially achieve successful trials using fewer recruits thereby offering economic and ethical advantages.
Several novel statistical methods have been developed to estimate large integrated volatility matrices based on high-frequency financial data. To investigate their asymptotic behaviors, they require a sub-Gaussian or finite high-order moment assumption for observed log-returns, which cannot account for the heavy tail phenomenon of stock returns. Recently, a robust estimator was developed to handle heavy-tailed distributions with some bounded fourth-moment assumption. However, we often observe that log-returns have heavier tail distribution than the finite fourth-moment and that the degrees of heaviness of tails are heterogeneous over the asset and time period. In this paper, to deal with the heterogeneous heavy-tailed distributions, we develop an adaptive robust integrated volatility estimator that employs pre-averaging and truncation schemes based on jump-diffusion processes. We call this an adaptive robust pre-averaging realized volatility (ARP) estimator. We show that the ARP estimator has a sub-Weibull tail concentration with only finite 2$alpha$-th moments for any $alpha>1$. In addition, we establish matching upper and lower bounds to show that the ARP estimation procedure is optimal. To estimate large integrated volatility matrices using the approximate factor model, the ARP estimator is further regularized using the principal orthogonal complement thresholding (POET) method. The numerical study is conducted to check the finite sample performance of the ARP estimator.