ﻻ يوجد ملخص باللغة العربية
B cells develop high affinity receptors during the course of affinity maturation, a cyclic process of mutation and selection. At the end of affinity maturation, a number of cells sharing the same ancestor (i.e. in the same clonal family) are released from the germinal center, their amino acid frequency profile reflects the allowed and disallowed substitutions at each position. These clonal-family-specific frequency profiles, called substitution profiles, are useful for studying the course of affinity maturation as well as for antibody engineering purposes. However, most often only a single sequence is recovered from each clonal family in a sequencing experiment, making it impossible to construct a clonal-family-specific substitution profile. Given the public release of many high-quality large B cell receptor datasets, one may ask whether it is possible to use such data in a prediction model for clonal-family-specific substitution profiles. In this paper, we present the method Substitution Profiles Using Related Families (SPURF), a penalized tensor regression framework that integrates information from a rich assemblage of datasets to predict the clonal-family-specific substitution profile for any single input sequence. Using this framework, we show that substitution profiles from similar clonal families can be leveraged together with simulated substitution profiles and germline gene sequence information to improve prediction. We fit this model on a large public dataset and validate the robustness of our approach on an external dataset. Furthermore, we provide a command-line tool in an open-source software package (https://github.com/krdav/SPURF) implementing these ideas and providing easy prediction using our pre-fit models.
In most of the recent immunological literature the differences across antigen receptor populations are examined via non-parametric statistical measures of species overlap and diversity borrowed from ecological studies. While this approach is robust i
We are frequently faced with a large collection of antibodies, and want to select those with highest affinity for their cognate antigen. When developing a first-line therapeutic for a novel pathogen, for instance, we might look for such antibodies in
The collection of immunoglobulin genes in an individuals germline, which gives rise to B cell receptors via recombination, is known to vary significantly across individuals. In humans, for example, each individual has only a fraction of the several h
Probabilistic modeling is fundamental to the statistical analysis of complex data. In addition to forming a coherent description of the data-generating process, probabilistic models enable parameter inference about given data sets. This procedure is
Recent technological advances in Next Generation Sequencing tools have led to increasing speeds of DNA sample collection, preparation, and sequencing. One instrument can produce over 600 Gb of genetic sequence data in a single run. This creates new o