No Arabic abstract
Linear Mixed Effects (LME) models have been widely applied in clustered data analysis in many areas including marketing research, clinical trials, and biomedical studies. Inference can be conducted using maximum likelihood approach if assuming Normal distributions on the random effects. However, in many applications of economy, business and medicine, it is often essential to impose constraints on the regression parameters after taking their real-world interpretations into account. Therefore, in this paper we extend the classical (unconstrained) LME models to allow for sign constraints on its overall coefficients. We propose to assume a symmetric doubly truncated Normal (SDTN) distribution on the random effects instead of the unconstrained Normal distribution which is often found in classical literature. With the aforementioned change, difficulty has dramatically increased as the exact distribution of the dependent variable becomes analytically intractable. We then develop likelihood-based approaches to estimate the unknown model parameters utilizing the approximation of its exact distribution. Simulation studies have shown that the proposed constrained model not only improves real-world interpretations of results, but also achieves satisfactory performance on model fits as compared to the existing model.
The functional linear model is a popular tool to investigate the relationship between a scalar/functional response variable and a scalar/functional covariate. We generalize this model to a functional linear mixed-effects model when repeated measurements are available on multiple subjects. Each subject has an individual intercept and slope function, while shares common population intercept and slope function. This model is flexible in the sense of allowing the slope random effects to change with the time. We propose a penalized spline smoothing method to estimate the population and random slope functions. A REML-based EM algorithm is developed to estimate the variance parameters for the random effects and the data noise. Simulation studies show that our estimation method provides an accurate estimate for the functional linear mixed-effects model with the finite samples. The functional linear mixed-effects model is demonstrated by investigating the effect of the 24-hour nitrogen dioxide on the daily maximum ozone concentrations and also studying the effect of the daily temperature on the annual precipitation.
Linear mixed-effects models are widely used in analyzing clustered or repeated measures data. We propose a quasi-likelihood approach for estimation and inference of the unknown parameters in linear mixed-effects models with high-dimensional fixed effects. The proposed method is applicable to general settings where the dimension of the random effects and the cluster sizes are possibly large. Regarding the fixed effects, we provide rate optimal estimators and valid inference procedures that do not rely on the structural information of the variance components. We also study the estimation of variance components with high-dimensional fixed effects in general settings. The algorithms are easy to implement and computationally fast. The proposed methods are assessed in various simulation settings and are applied to a real study regarding the associations between body mass index and genetic polymorphic markers in a heterogeneous stock mice population.
Modeling of longitudinal data often requires diffusion models that incorporate overall time-dependent, nonlinear dynamics of multiple components and provide sufficient flexibility for subject-specific modeling. This complexity challenges parameter inference and approximations are inevitable. We propose a method for approximate maximum-likelihood parameter estimation in multivariate time-inhomogeneous diffusions, where subject-specific flexibility is accounted for by incorporation of multidimensional mixed effects and covariates. We consider $N$ multidimensional independent diffusions $X^i = (X^i_t)_{0leq tleq T^i}, 1leq ileq N$, with common overall model structure and unknown fixed-effects parameter $mu$. Their dynamics differ by the subject-specific random effect $phi^i$ in the drift and possibly by (known) covariate information, different initial conditions and observation times and duration. The distribution of $phi^i$ is parametrized by an unknown $vartheta$ and $theta = (mu, vartheta)$ is the target of statistical inference. Its maximum likelihood estimator is derived from the continuous-time likelihood. We prove consistency and asymptotic normality of $hat{theta}_N$ when the number $N$ of subjects goes to infinity using standard techniques and consider the more general concept of local asymptotic normality for less regular models. The bias induced by time-discretization of sufficient statistics is investigated. We discuss verification of conditions and investigate parameter estimation and hypothesis testing in simulations.
Support vector machine (SVM) is one of the most popular classification algorithms in the machine learning literature. We demonstrate that SVM can be used to balance covariates and estimate average causal effects under the unconfoundedness assumption. Specifically, we adapt the SVM classifier as a kernel-based weighting procedure that minimizes the maximum mean discrepancy between the treatment and control groups while simultaneously maximizing effective sample size. We also show that SVM is a continuous relaxation of the quadratic integer program for computing the largest balanced subset, establishing its direct relation to the cardinality matching method. Another important feature of SVM is that the regularization parameter controls the trade-off between covariate balance and effective sample size. As a result, the existing SVM path algorithm can be used to compute the balance-sample size frontier. We characterize the bias of causal effect estimation arising from this trade-off, connecting the proposed SVM procedure to the existing kernel balancing methods. Finally, we conduct simulation and empirical studies to evaluate the performance of the proposed methodology and find that SVM is competitive with the state-of-the-art covariate balancing methods.
We propose to use the difference in natural parameters (DINA) to quantify the heterogeneous treatment effect for exponential family models, in contrast to the difference in means. Similarly we model the hazard ratios for the Cox model. For binary outcomes and survival times, DINA is both convenient and perhaps more practical for modeling the covariates influences on the treatment effect. We introduce a DINA estimator that is insensitive to confounding and non-collapsibility issues, and allows practitioners to use powerful off-the-shelf machine learning tools for nuisance estimation. We use extensive simulations to demonstrate the efficacy of the proposed method with various response distributions and censoring mechanisms. We also apply the proposed method to the SPRINT dataset to estimate the heterogeneous treatment effect, demonstrate the methods robustness to nuisance estimation, and conduct a placebo evaluation.