No Arabic abstract
We review algorithms for protein design in general. Although these algorithms have a rich combinatorial, geometric, and mathematical structure, they are almost never covered in computer science classes. Furthermore, many of these algorithms admit provable guarantees of accuracy, soundness, complexity, completeness, optimality, and approximation bounds. The algorithms represent a delicate and beautiful balance between discrete and continuous computation and modeling, analogous to that which is seen in robotics, computational geometry, and other fields in computational science. Finally, computer scientists may be unaware of the almost direct impact of these algorithms for predicting and introducing molecular therapies that have gone in a short time from mathematics to algorithms to software to predictions to preclinical testing to clinical trials. Indeed, the overarching goal of these algorithms is to enable the development of new therapeutics that might be impossible or too expensive to discover using experimental methods. Thus the potential impact of these algorithms on individual, community, and global health has the potential to be quite significant.
A general strategy is described for finding which amino acid sequences have native states in a desired conformation (inverse design). The approach is used to design sequences of 48 hydrophobic and polar aminoacids on three-dimensional lattice structures. Previous studies employing a sequence-space Monte-Carlo technique resulted in the successful design of one sequence in ten attempts. The present work also entails the exploration of conformations that compete significantly with the target structure for being its ground state. The design procedure is successful in all the ten cases.
Free energy landscapes decisively determine the progress of enzymatically catalyzed reactions[1]. Time-resolved macromolecular crystallography unifies transient-state kinetics with structure determination [2-4] because both can be determined from the same set of X-ray data. We demonstrate here how barriers of activation can be determined solely from five-dimensional crystallography [5]. Directly linking molecular structures with barriers of activation between them allows for gaining insight into the structural nature of the barrier. We analyze comprehensive time series of crystal-lographic data at 14 different temperature settings and determine entropy and enthalpy contributions to the barriers of activation. 100 years after the discovery of X-ray scattering, we advance X-ray structure determination to a new frontier, the determination of energy landscapes.
An innovative strategy for the optimal design of planar frames able to resist to seismic excitations is here proposed. The procedure is based on genetic algorithms (GA) which are performed according to a nested structure suitable to be implemented in parallel computing on several devices. In particular, this solution foresees two nested genetic algorithms. The first one, named External GA, seeks, among a predefined list of profiles, the size of the structural elements of the frame which correspond to the most performing solution associated to the highest value of an appropriate fitness function. The latter function takes into account, among other considerations, of the seismic safety factor and the failure mode which are calculated by means of the second algorithm, named Internal GA. The details of the proposed procedure are provided and applications to the seismic design of two frames of different size are described.
Model-based process design of ion-exchange simulated moving bed (IEX-SMB) chromatography for center-cut separation of proteins is studied. Use of nonlinear binding models that describe more accurate adsorption behaviours of macro-molecules could make it impossible to utilize triangle theory to obtain operating parameters. Moreover, triangle theory provides no rules to design salt profiles in IEX-SMB. In the modelling study here, proteins (i.e., ribonuclease, cytochrome and lysozyme) on the chromatographic columns packed with strong cation-exchanger SP Sepharose FF is used as an example system. The general rate model with steric mass-action kinetics was used; two closed-loop IEX-SMB network schemes were investigated (i.e., cascade and eight-zone schemes). Performance of the IEX-SMB schemes was examined with respect to multi-objective indicators (i.e., purity and yield) and productivity, and compared to a single column batch system with the same amount of resin utilized. A multi-objective sampling algorithm, Markov Chain Monte Carlo (MCMC), was used to generate samples for constructing the Pareto optimal fronts. MCMC serves on the sampling purpose, which is interested in sampling the Pareto optimal points as well as those near Pareto optimal. Pareto fronts of the three schemes provide the full information of trade-off between the conflicting indicators of purity and yield. The results indicate the cascade IEX-SMB scheme and the integrated eight-zone IEX-SMB scheme have the similar performance that both outperforms the single column batch system.
Motivation: The ability to generate massive amounts of sequencing data continues to overwhelm the processing capability of existing algorithms and compute infrastructures. In this work, we explore the use of hardware/software co-design and hardware acceleration to significantly reduce the execution time of short sequence alignment, a crucial step in analyzing sequenced genomes. We introduce Shouji, a highly-parallel and accurate pre-alignment filter that remarkably reduces the need for computationally-costly dynamic programming algorithms. The first key idea of our proposed pre-alignment filter is to provide high filtering accuracy by correctly detecting all common subsequences shared between two given sequences. The second key idea is to design a hardware accelerator that adopts modern FPGA (Field-Programmable Gate Array) architectures to further boost the performance of our algorithm. Results: Shouji significantly improves the accuracy of pre-alignment filtering by up to two orders of magnitude compared to the state-of-the-art pre-alignment filters, GateKeeper and SHD. Our FPGA-based accelerator is up to three orders of magnitude faster than the equivalent CPU implementation of Shouji. Using a single FPGA chip, we benchmark the benefits of integrating Shouji with five state-of-the-art sequence aligners, designed for different computing platforms. The addition of Shouji as a pre-alignment step reduces the execution time of the five state-of-the-art sequence aligners by up to 18.8x. Shouji can be adapted for any bioinformatics pipeline that performs sequence alignment for verification. Unlike most existing methods that aim to accelerate sequence alignment, Shouji does not sacrifice any of the aligner capabilities, as it does not modify or replace the alignment step. Availability: https://github.com/CMU-SAFARI/Shouji