No Arabic abstract
Antibodies, an essential part of our immune system, develop through an intricate process to bind a wide array of pathogens. This process involves randomly mutating DNA sequences encoding these antibodies to find variants with improved binding, though mutations are not distributed uniformly across sequence sites. Immunologists observe this nonuniformity to be consistent with mutation motifs, which are short DNA subsequences that affect how likely a given site is to experience a mutation. Quantifying the effect of motifs on mutation rates is challenging: a large number of possible motifs makes this statistical problem high dimensional, while the unobserved history of the mutation process leads to a nontrivial missing data problem. We introduce an $ell_1$-penalized proportional hazards model to infer mutation motifs and their effects. In order to estimate model parameters, our method uses a Monte Carlo EM algorithm to marginalize over the unknown ordering of mutations. We show that our method performs better on simulated data compared to current methods and leads to more parsimonious models. The application of proportional hazards to mutation processes is, to our knowledge, novel and formalizes the current methods in a statistical framework that can be easily extended to analyze the effect of other biological features on mutation rates.
The fitness of a biological strategy is typically measured by its expected reproductive rate, the first moment of its offspring distribution. However, strategies with high expected rates can also have high probabilities of extinction. A similar situation is found in gambling and investment, where strategies with a high expected payoff can also have a high risk of ruin. We take inspiration from the gamblers ruin problem to examine how extinction is related to population growth. Using moment theory we demonstrate how higher moments can impact the probability of extinction. We discuss how moments can be used to find bounds on the extinction probability, focusing on s-convex ordering of random variables, a method developed in actuarial science. This approach generates best case and worst case scenarios to provide upper and lower bounds on the probability of extinction. Our results demonstrate that even the most fit strategies can have high probabilities of extinction.
In the process of clinical diagnosis and treatment, the restricted mean survival time (RMST), which reflects the life expectancy of patients up to a specified time, can be used as an appropriate outcome measure. However, the RMST only calculates the mean survival time of patients within a period of time after the start of follow-up and may not accurately portray the change in a patients life expectancy over time. The life expectancy can be adjusted for the time the patient has already survived and defined as the conditional restricted mean survival time (cRMST). A dynamic RMST model based on the cRMST can be established by incorporating time-dependent covariates and covariates with time-varying effects. We analysed data from a study of primary biliary cirrhosis (PBC) to illustrate the use of the dynamic RMST model. The predictive performance was evaluated using the C-index and the prediction error. The proposed dynamic RMST model, which can explore the dynamic effects of prognostic factors on survival time, has better predictive performance than the RMST model. Three PBC patient examples were used to illustrate how the predicted cRMST changed at different prediction times during follow-up. The use of the dynamic RMST model based on the cRMST allows for optimization of evidence-based decision-making by updating personalized dynamic life expectancy for patients.
Smooth backfitting has proven to have a number of theoretical and practical advantages in structured regression. Smooth backfitting projects the data down onto the structured space of interest providing a direct link between data and estimator. This paper introduces the ideas of smooth backfitting to survival analysis in a proportional hazard model, where we assume an underlying conditional hazard with multiplicative components. We develop asymptotic theory for the estimator and we use the smooth backfitter in a practical application, where we extend recent advances of in-sample forecasting methodology by allowing more information to be incorporated, while still obeying the structured requirements of in-sample forecasting.
The success of DNA nanotechnology has been driven by the discovery of novel structural motifs with a wide range of shapes and uses. We present a comprehensive study of the T-motif, a 3-armed, planar, right-angled junction that has been used in the self-assembly of DNA polyhedra and periodic structures. The motif is formed through the interaction of a bulge loop in one duplex and a sticky end of another. The polarity of the sticky end has significant consequences for the thermodynamic and geometrical properties of the T-motif: different polarities create junctions spanning different grooves of the duplex. We compare experimental binding strengths with predictions of oxDNA, a coarse-grained model of DNA, for various loop sizes. We find that, although both sticky-end polarities can create stable junctions, junctions resulting from 5$$ sticky ends are stable over a wider range of bulge loop sizes. We highlight the importance of possible coaxial stacking interactions within the motif and investigate how each coaxial stacking interaction stabilises the structure and favours a particular geometry.
Motivated by the classical Susceptible-Infected-Recovered (SIR) epidemic models proposed by Kermack and Mckendrick, we consider a class of stochastic compartmental dynamical systems with a notion of partial ordering among the compartments. We call such systems unidirectional Mass Transfer Models (MTMs). We show that there is a natural way of interpreting a uni-directional MTM as a Survival Dynamical System (SDS) that is described in terms of survival functions instead of population counts. This SDS interpretation allows us to employ tools from survival analysis to address various issues with data collection and statistical inference of unidirectional MTMs. In particular, we propose and numerically validate a statistical inference procedure based on SDS-likelihoods. We use the SIR model as a running example throughout the paper to illustrate the ideas.