No Arabic abstract
We consider the problem of probabilistic projection of the total fertility rate (TFR) for subnational regions. We seek a method that is consistent with the UNs recently adopted Bayesian method for probabilistic TFR projections for all countries, and works well for all countries. We assess various possible methods using subnational TFR data for 47 countries. We find that the method that performs best in terms of out-of-sample predictive performance and also in terms of reproducing the within-country correlation in TFR is a method that scales the national trajectory by a region-specific scale factor that is allowed to vary slowly over time. This supports the hypothesis of Watkins (1990, 1991) that within-country TFR converges over time in response to country-specific factors, and extends the Watkins hypothesis to the last 50 years and to a much wider range of countries around the world.
Accurate estimates of subnational populations are important for policy formulation and monitoring population health indicators. For example, estimates of the number of women of reproductive age are important to understand the population at risk to maternal mortality and unmet need for contraception. However, in many low-income countries, data on population counts and components of population change are limited, and so levels and trends subnationally are unclear. We present a Bayesian constrained cohort component model for the estimation and projection of subnational populations. The model builds on a cohort component projection framework, incorporates census data and estimates from the United Nations World Population Prospects, and uses characteristic mortality schedules to obtain estimates of population counts and the components of population change, including internal migration. The data required as inputs to the model are minimal and available across a wide range of countries, including most low-income countries. The model is applied to estimate and project populations by county in Kenya for 1979-2019, and validated against the 2019 Kenyan census.
This paper sets out a forecasting method that employs a mixture of parametric functions to capture the pattern of fertility with respect to age. The overall level of cohort fertility is decomposed over the range of fertile ages using a mixture of parametric density functions. The level of fertility and the parameters describing the shape of the fertility curve are projected foward using time series methods. The model is estimated within a Bayesian framework, allowing predictive distributions of future fertility rates to be produced that naturally incorporate both time series and parametric uncertainty. A number of choices are possible for the precise form of the functions used in the two-component mixtures. The performance of several model variants is tested on data from four countries; England and Wales, the USA, Sweden and France. The former two countries exhibit multi-modality in their fertility rate curves as a function of age, while the latter two are largely uni-modal. The models are estimated using Hamiltonian Monte Carlo and the `stan` software package on data covering the period up to 2006, with the period 2007-2016 held back for assessment purposes. Forecasting performance is found to be comparable to other models identified as producing accurate fertility forecasts in the literature.
Reliable mortality estimates at the subnational level are essential in the study of health inequalities within a country. One of the difficulties in producing such estimates is the presence of small populations, where the stochastic variation in death counts is relatively high, and so the underlying mortality levels are unclear. We present a Bayesian hierarchical model to estimate mortality at the subnational level. The model builds on characteristic age patterns in mortality curves, which are constructed using principal components from a set of reference mortality curves. Information on mortality rates are pooled across geographic space and smoothed over time. Testing of the model shows reasonable estimates and uncertainty levels when the model is applied to both simulated data which mimic US counties, and real data for French departments. The estimates produced by the model have direct applications to the study of subregional health patterns and disparities.
The sex ratio at birth (SRB) in India has been reported imbalanced since the 1970s. Previous studies have shown a great variation in the SRB across geographic locations in India till 2016. As one of the most populous countries and in view of its great regional heterogeneity, it is crucial to produce probabilistic projections for the SRB in India at state level for the purpose of population projection and policy planning. In this paper, we implement a Bayesian hierarchical time series model to project SRB in India by state. We generate SRB probabilistic projections from 2017 to 2030 for 29 States and Union Territories (UTs) in India, and present results in 21 States/UTs with data from the Sample Registration System. Our analysis takes into account two state-specific factors that contribute to sex-selective abortion and resulting sex imbalances at birth: intensity of son preference and fertility squeeze. We project that the largest contribution to female births deficits is in Uttar Pradesh, with cumulative number of missing female births projected to be 2.0 (95% credible interval [1.9; 2.2]) million from 2017 to 2030. The total female birth deficits during 2017-2030 for the whole India is projected to be 6.8 [6.6; 7.0] million.
Statistical and computational methods are widely used in todays scientific studies. Using a female fertility potential in childhood cancer survivors as an example, we illustrate how these methods can be used to extract insight regarding biological processes from noisy observational data in order to inform decision making. We start by contextualizing the computational methods with the working example: the modelling of acute ovarian failure risk in female childhood cancer survivors to quantify the risk of permanent ovarian failure due to exposure to lifesaving but nonetheless toxic cancer treatments. This is followed by a description of the general framework of classification problems. We provide an overview of the modelling algorithms employed in our example, including one classic model (logistic regression) and two popular modern learning methods (random forest and support vector machines). Using the working example, we show the general steps of data preparation for modelling, variable selection steps for the classic model, and how model performance might be improved utilizing visualization tools. We end with a note on the importance of model evaluation.