No Arabic abstract
Typically, case-control studies to estimate odds-ratios associating risk factors with disease incidence from logistic regression only include cases with newly diagnosed disease. Recently proposed methods allow incorporating information on prevalent cases, individuals who survived from disease diagnosis to sampling, into cross-sectionally sampled case-control studies under parametric assumptions for the survival time after diagnosis. Here we propose and study methods to additionally use prospectively observed survival times from prevalent and incident cases to adjust logistic models for the time between disease diagnosis and sampling, the backward time, for prevalent cases. This adjustment yields unbiased odds-ratio estimates from case-control studies that include prevalent cases. We propose a computationally simple two-step generalized method-of-moments estimation procedure. First, we estimate the survival distribution based on a semi-parametric Cox model using an expectation-maximization algorithm that yields fully efficient estimates and accommodates left truncation for the prevalent cases and right censoring. Then, we use the estimated survival distribution in an extension of the logistic model to three groups (controls, incident and prevalent cases), to accommodate the survival bias in prevalent cases. In simulations, when the amount of censoring was modest, odds-ratios from the two-step procedure were equally efficient as those estimated by jointly optimizing the logistic and survival data likelihoods under parametric assumptions. Even with 90% censoring they were as efficient as estimates obtained using only cross-sectionally available information under parametric assumptions. This indicates that utilizing prospective survival data from the cases lessens model dependency and improves precision of association estimates for case-control studies with prevalent cases.
The use of case-crossover designs has become widespread in epidemiological and medical investigations of transient associations. However, the most popular reference-select strategy, the time-stratified schema, is not a suitable solution for controlling bias in case-crossover studies. To prove this, we conducted a time series decomposition for daily ozone (O3) records; scrutinized the ability of the time-stratified schema on controlling the yearly, monthly and weekly time trends; and found it failed on controlling the weekly time trend. Based on this finding, we proposed a new logistic regression approach in which we did adjustment for the weekly time trend. A comparison between the traditional model and the proposed method was done by simulation. An empirical study was conducted to explore potential associations between air pollutants and AMI hospitalizations. In summary, time-stratified schema provide effective control on yearly and monthly time trends but not on weekly time trend. Therefore, the estimation from the traditional logistical regression basically reveals the effect of weekly time trend, instead of the transient effect. In contrast, the proposed logistic regression with adjustment for weekly time trend can effectively eliminate system bias in case-crossover studies.
Can two separate case-control studies, one about Hepatitis disease and the other about Fibrosis, for example, be combined together? It would be hugely beneficial if two or more separately conducted case-control studies, even for entirely irrelevant purposes, can be merged together with a unified analysis that produces better statistical properties, e.g., more accurate estimation of parameters. In this paper, we show that, when using the popular logistic regression model, the combined/integrative analysis produces a more accurate estimation of the slope parameters than the single case-control study. It is known that, in a single logistic case-control study, the intercept is not identifiable, contrary to prospective studies. In combined case-control studies, however, the intercepts are proved to be identifiable under mild conditions. The resulting maximum likelihood estimates of the intercepts and slopes are proved to be consistent and asymptotically normal, with asymptotic variances achieving the semiparametric efficiency lower bound.
We propose a method to test for the presence of differential ascertainment in case-control studies, when data are collected by multiple sources. We show that, when differential ascertainment is present, the use of only the observed cases leads to severe bias in the computation of the odds ratio. We can alleviate the effect of such bias using the estimates that our method of testing for differential ascertainment naturally provides. We apply it to a dataset obtained from the National Violent Death Reporting System, with the goal of checking for the presence of differential ascertainment by race in the count of deaths caused by child maltreatment.
Most clinical trials involve the comparison of a new treatment to a control arm (e.g., the standard of care) and the estimation of a treatment effect. External data, including historical clinical trial data and real-world observational data, are commonly available for the control arm. Borrowing information from external data holds the promise of improving the estimation of relevant parameters and increasing the power of detecting a treatment effect if it exists. In this paper, we propose to use Bayesian additive regression trees (BART) for incorporating external data into the analysis of clinical trials, with a specific goal of estimating the conditional or population average treatment effect. BART naturally adjusts for patient-level covariates and captures potentially heterogeneous treatment effects across different data sources, achieving flexible borrowing. Simulation studies demonstrate that BART compares favorably to a hierarchical linear model and a normal-normal hierarchical model. We illustrate the proposed method with an acupuncture trial.
We propose the variable selection procedure incorporating prior constraint information into lasso. The proposed procedure combines the sample and prior information, and selects significant variables for responses in a narrower region where the true parameters lie. It increases the efficiency to choose the true model correctly. The proposed procedure can be executed by many constrained quadratic programming methods and the initial estimator can be found by least square or Monte Carlo method. The proposed procedure also enjoys good theoretical properties. Moreover, the proposed procedure is not only used for linear models but also can be used for generalized linear models({sl GLM}), Cox models, quantile regression models and many others with the help of Wang and Leng (2007)s LSA, which changes these models as the approximation of linear models. The idea of combining sample and prior constraint information can be also used for other modified lasso procedures. Some examples are used for illustration of the idea of incorporating prior constraint information in variable selection procedures.