Incorporating survival data into case-control studies with incident and prevalent cases


Abstract in English

Typically, case-control studies to estimate odds-ratios associating risk factors with disease incidence from logistic regression only include cases with newly diagnosed disease. Recently proposed methods allow incorporating information on prevalent cases, individuals who survived from disease diagnosis to sampling, into cross-sectionally sampled case-control studies under parametric assumptions for the survival time after diagnosis. Here we propose and study methods to additionally use prospectively observed survival times from prevalent and incident cases to adjust logistic models for the time between disease diagnosis and sampling, the backward time, for prevalent cases. This adjustment yields unbiased odds-ratio estimates from case-control studies that include prevalent cases. We propose a computationally simple two-step generalized method-of-moments estimation procedure. First, we estimate the survival distribution based on a semi-parametric Cox model using an expectation-maximization algorithm that yields fully efficient estimates and accommodates left truncation for the prevalent cases and right censoring. Then, we use the estimated survival distribution in an extension of the logistic model to three groups (controls, incident and prevalent cases), to accommodate the survival bias in prevalent cases. In simulations, when the amount of censoring was modest, odds-ratios from the two-step procedure were equally efficient as those estimated by jointly optimizing the logistic and survival data likelihoods under parametric assumptions. Even with 90% censoring they were as efficient as estimates obtained using only cross-sectionally available information under parametric assumptions. This indicates that utilizing prospective survival data from the cases lessens model dependency and improves precision of association estimates for case-control studies with prevalent cases.

Download