No Arabic abstract
Accurate estimation of cancer mortality rates and the comparison across cancer sites, populations or time periods is crucial to public health, as identification of vulnerable groups who suffer the most from these diseases may lead to efficient cancer care and control with timely treatment. Because cancer mortality rate varies with age, comparisons require age-standardization using a reference population. The current method of using the Year 2000 Population Standard is standard practice, but serious concerns have been raised about its lack of justification. We have found that using the US Year 2000 Population Standard as reference overestimates prostate cancer mortality rates by 12-91% during the period 1970-2009 across all six sampled U.S. states, and also underestimates case fatality rates by 9-78% across six cancer sites, including female breast, cervix, prostate, lung, leukemia and colon-rectum. We develop a mean reference population method to minimize the bias using mathematical optimization theory and statistical modeling. The method corrects the bias to the largest extent in terms of squared loss and can be applied broadly to studies of many diseases.
Coronavirus disease 2019 (COVID-19) pandemic is an unprecedented global public health challenge. In the United States (US), state governments have implemented various non-pharmaceutical interventions (NPIs), such as physical distance closure (lockdown), stay-at-home order, mandatory facial mask in public in response to the rapid spread of COVID-19. To evaluate the effectiveness of these NPIs, we propose a nested case-control design with propensity score weighting under the quasi-experiment framework to estimate the average intervention effect on disease transmission across states. We further develop a method to test for factors that moderate intervention effect to assist precision public health intervention. Our method takes account of the underlying dynamics of disease transmission and balance state-level pre-intervention characteristics. We prove that our estimator provides causal intervention effect under assumptions. We apply this method to analyze US COVID-19 incidence cases to estimate the effects of six interventions. We show that lockdown has the largest effect on reducing transmission and reopening bars significantly increase transmission. States with a higher percentage of non-white population are at greater risk of increased $R_t$ associated with reopening bars.
If Electronic Health Records contain a large amount of information about the patients condition and response to treatment, which can potentially revolutionize the clinical practice, such information is seldom considered due to the complexity of its extraction and analysis. We here report on a first integration of an NLP framework for the analysis of clinical records of lung cancer patients making use of a telephone assistance service of a major Spanish hospital. We specifically show how some relevant data, about patient demographics and health condition, can be extracted; and how some relevant analyses can be performed, aimed at improving the usefulness of the service. We thus demonstrate that the use of EHR texts, and their integration inside a data analysis framework, is technically feasible and worth of further study.
This paper proposes a two-fold factor model for high-dimensional functional time series (HDFTS), which enables the modeling and forecasting of multi-population mortality under the functional data framework. The proposed model first decomposes the HDFTS into functional time series with lower dimensions (common feature) and a system of basis functions specific to different cross-sections (heterogeneity). Then the lower-dimensional common functional time series are further reduced into low-dimensional scalar factor matrices. The dimensionally reduced factor matrices can reasonably convey useful information in the original HDFTS. All the temporal dynamics contained in the original HDFTS are extracted to facilitate forecasting. The proposed model can be regarded as a general case of several existing functional factor models. Through a Monte Carlo simulation, we demonstrate the performance of the proposed method in model fitting. In an empirical study of the Japanese subnational age-specific mortality rates, we show that the proposed model produces more accurate point and interval forecasts in modeling multi-population mortality than those existing functional factor models. The financial impact of the improvements in forecasts is demonstrated through comparisons in life annuity pricing practices.
This paper extends Bayesian mortality projection models for multiple populations considering the stochastic structure and the effect of spatial autocorrelation among the observations. We explain high levels of overdispersion according to adjacent locations based on the conditional autoregressive model. In an empirical study, we compare different hierarchical projection models for the analysis of geographical diversity in mortality between the Japanese counties in multiple years, according to age. By a Markov chain Monte Carlo (MCMC) computation, results have demonstrated the flexibility and predictive performance of our proposed model.
Many existing mortality models follow the framework of classical factor models, such as the Lee-Carter model and its variants. Latent common factors in factor models are defined as time-related mortality indices (such as $kappa_t$ in the Lee-Carter model). Factor loadings, which capture the linear relationship between age variables and latent common factors (such as $beta_x$ in the Lee-Carter model), are assumed to be time-invariant in the classical framework. This assumption is usually too restrictive in reality as mortality datasets typically span a long period of time. Driving forces such as medical improvement of certain diseases, environmental changes and technological progress may significantly influence the relationship of different variables. In this paper, we first develop a factor model with time-varying factor loadings (time-varying factor model) as an extension of the classical factor model for mortality modelling. Two forecasting methods to extrapolate the factor loadings, the local regression method and the naive method, are proposed for the time-varying factor model. From the empirical data analysis, we find that the new model can capture the empirical feature of time-varying factor loadings and improve mortality forecasting over different horizons and countries. Further, we propose a novel approach based on change point analysis to estimate the optimal `boundary between short-term and long-term forecasting, which is favoured by the local linear regression and naive method, respectively. Additionally, simulation studies are provided to show the performance of the time-varying factor model under various scenarios.