No Arabic abstract
The cross-classified sampling design consists in drawing samples from a two-dimension population, independently in each dimension. Such design is commonly used in consumer price index surveys and has been recently applied to draw a sample of babies in the French ELFE survey, by crossing a sample of maternity units and a sample of days. We propose to derive a general theory of estimation for this sampling design. We consider the Horvitz-Thompson estimator for a total, and show that the cross-classified design will usually result in a loss of efficiency as compared to the widespread two-stage design. We obtain the asymptotic distribution of the Horvitz-Thompson estimator, and several unbiased variance estimators. Facing the problem of possibly negative values, we propose simplified non-negative variance estimators and study their bias under a super-population model. The proposed estimators are compared for totals and ratios on simulated data. An application on real data from the ELFE survey is also presented, and we make some recommendations. Supplementary materials are available online.
Coarse structural nested mean models are used to estimate treatment effects from longitudinal observational data. Coarse structural nested mean models lead to a large class of estimators. It turns out that estimates and standard errors may differ considerably within this class. We prove that, under additional assumptions, there exists an explicit solution for the optimal estimator within the class of coarse structural nested mean models. Moreover, we show that even if the additional assumptions do not hold, this optimal estimator is doubly-robust: it is consistent and asymptotically normal not only if the model for treatment initiation is correct, but also if a certain outcome-regression model is correct. We compare the optimal estimator to some naive choices within the class of coarse structural nested mean models in a simulation study. Furthermore, we apply the optimal and naive estimators to study how the CD4 count increase due to one year of antiretroviral treatment (ART) depends on the time between HIV infection and ART initiation in recently infected HIV infected patients. Both in the simulation study and in the application, the use of optimal estimators leads to substantial increases in precision.
This paper focuses on the time series generated by the event counts of stationary Hawkes processes. When the exact locations of points are not observed, but only counts over time intervals of fixed size, existing methods of estimation are not applicable. We first establish a strong mixing condition with polynomial decay rate for Hawkes processes, from their Poisson cluster structure. This allows us to propose a spectral approach to the estimation of Hawkes processes, based on Whittles method, which provides consistent and asymptotically normal estimates under common regularity conditions on their reproduction kernels. Simulated datasets and a case-study illustrate the performances of the estimation, notably of the Hawkes reproduction mean and kernel when time intervals are relatively large.
In this paper, we estimate the high dimensional precision matrix under the weak sparsity condition where many entries are nearly zero. We study a Lasso-type method for high dimensional precision matrix estimation and derive general error bounds under the weak sparsity condition. The common irrepresentable condition is relaxed and the results are applicable to the weak sparse matrix. As applications, we study the precision matrix estimation for the heavy-tailed data, the non-paranormal data, and the matrix data with the Lasso-type method.
The infinite-dimensional Hilbert sphere $S^infty$ has been widely employed to model density functions and shapes, extending the finite-dimensional counterpart. We consider the Frechet mean as an intrinsic summary of the central tendency of data lying on $S^infty$. To break a path for sound statistical inference, we derive properties of the Frechet mean on $S^infty$ by establishing its existence and uniqueness as well as a root-$n$ central limit theorem (CLT) for the sample version, overcoming obstructions from infinite-dimensionality and lack of compactness on $S^infty$. Intrinsic CLTs for the estimated tangent vectors and covariance operator are also obtained. Asymptotic and bootstrap hypothesis tests for the Frechet mean based on projection and norm are then proposed and are shown to be consistent. The proposed two-sample tests are applied to make inference for daily taxi demand patterns over Manhattan modeled as densities, of which the square roots are analyzed on the Hilbert sphere. Numerical properties of the proposed hypothesis tests which utilize the spherical geometry are studied in the real data application and simulations, where we demonstrate that the tests based on the intrinsic geometry compare favorably to those based on an extrinsic or flat geometry.
Our problem is to find a good approximation to the P-value of the maximum of a random field of test statistics for a cone alternative at each point in a sample of Gaussian random fields. These test statistics have been proposed in the neuroscience literature for the analysis of fMRI data allowing for unknown delay in the hemodynamic response. However the null distribution of the maximum of this 3D random field of test statistics, and hence the threshold used to detect brain activation, was unsolved. To find a solution, we approximate the P-value by the expected Euler characteristic (EC) of the excursion set of the test statistic random field. Our main result is the required EC density, derived using the Gaussian Kinematic Formula.