No Arabic abstract
The origin and early spread of 2019-nCoV is studied by phylogenetic analysis using IC-PIC alignment-free method based on DNA/RNA sequence information correlation (IC) and partial information correlation (PIC). The topology of phylogenetic tree of Betacoronavirus is remarkably consistent with biologists systematics, classifies 2019-nCoV as Sarbecovirus of Betacoronavirus and supports the assumption that these novel viruses are of bat origin with pangolin as one of the possible intermediate hosts. The novel virus branch of phylogenetic tree shows location-virus linkage. The placement of root of the early 2019-nCoV tree is studied carefully in Neighbor Joining consensus algorithm by introducing different out-groups (Bat-related coronaviruses, Pangolin coronaviruses and HIV viruses etc.) and comparing with UPGMA consensus trees. Several oldest branches (lineages) of the 2019-nCoV tree are deduced that means the COVID-19 may begin to spread in several regions in the world before its outbreak in Wuhan.
RNA-seq has rapidly become the de facto technique to measure gene expression. However, the time required for analysis has not kept up with the pace of data generation. Here we introduce Sailfish, a novel computational method for quantifying the abundance of previously annotated RNA isoforms from RNA-seq data. Sailfish entirely avoids mapping reads, which is a time-consuming step in all current methods. Sailfish provides quantification estimates much faster than existing approaches (typically 20-times faster) without loss of accuracy.
Since the SARS outbreak in 2003, a lot of predictive epidemiological models have been proposed. At the end of 2019, a novel coronavirus, termed as 2019-nCoV, has broken out and is propagating in China and the world. Here we propose a multi-model ordinary differential equation set neural network (MMODEs-NN) and model-free methods to predict the interprovincial transmissions in mainland China, especially those from Hubei Province. Compared with the previously proposed epidemiological models, the proposed network can simulate the transportations with the ODEs activation method, while the model-free methods based on the sigmoid function, Gaussian function, and Poisson distribution are linear and fast to generate reasonable predictions. According to the numerical experiments and the realities, the special policies for controlling the disease are successful in some provinces, and the transmission of the epidemic, whose outbreak time is close to the beginning of China Spring Festival travel rush, is more likely to decelerate before February 18 and to end before April 2020. The proposed mathematical and artificial intelligence methods can give consistent and reasonable predictions of the 2019-nCoV ending. We anticipate our work to be a starting point for comprehensive prediction researches of the 2019-nCoV.
As the infection of 2019-nCoV coronavirus is quickly developing into a global pneumonia epidemic, careful analysis of its transmission and cellular mechanisms is sorely needed. In this report, we re-analyzed the computational approaches and findings presented in two recent manuscripts by Ji et al. (https://doi.org/10.1002/jmv.25682) and by Pradhan et al. (https://doi.org/10.1101/2020.01.30.927871), which concluded that snakes are the intermediate hosts of 2019-nCoV and that the 2019-nCoV spike protein insertions shared a unique similarity to HIV-1. Results from our re-implementation of the analyses, built on larger-scale datasets using state-of-the-art bioinformatics methods and databases, do not support the conclusions proposed by these manuscripts. Based on our analyses and existing data of coronaviruses, we concluded that the intermediate hosts of 2019-nCoV are more likely to be mammals and birds than snakes, and that the novel insertions observed in the spike protein are naturally evolved from bat coronaviruses.
Objectives.--To estimate the basic reproduction number of the Wuhan novel coronavirus (2019-nCoV). Methods.--Based on the susceptible-exposed-infected-removed (SEIR) compartment model and the assumption that the infectious cases with symptoms occurred before January 25, 2020 are resulted from free propagation without intervention, we estimate the basic reproduction number of 2019-nCoV according to the reported confirmed cases and suspected cases, as well as the theoretical estimated number of infected cases by other research teams, together with some epidemiological determinants learned from the severe acute respiratory syndrome. Results The basic reproduction number falls between 2.8 to 3.3 by using the real-time reports on the number of 2019-nCoV infected cases from Peoples Daily in China, and falls between 3.2 and 3.9 on the basis of the predicted number of infected cases from colleagues. Conclusions.--The early transmission ability of 2019-nCoV is closed to or slightly higher than SARS. It is a controllable disease with moderate-high transmissibility. Timely and effective control measures are needed to suppress the further transmissions. Notes Added.--Using a newly reported epidemiological determinants for early 2019-nCoV, the estimated basic reproduction number is in the range [2.2,3.0].
A common problem in bioinformatics is related to identifying gene regulatory regions marked by relatively high frequencies of motifs, or deoxyribonucleic acid sequences that often code for transcription and enhancer proteins. Predicting alignment scores between subsequence k-mers and a given motif enables the identification of candidate regulatory regions in a gene, which correspond to the transcription of these proteins. We propose a one-dimensional (1-D) Convolution Neural Network trained on k-mer formatted sequences interspaced with the given motif pattern to predict pairwise alignment scores between the consensus motif and subsequence k-mers. Our model consists of fifteen layers with three rounds of a one-dimensional convolution layer, a batch normalization layer, a dense layer, and a 1-D maximum pooling layer. We train the model using mean squared error loss on four different data sets each with a different motif pattern randomly inserted in DNA sequences: the first three data sets have zero, one, and two mutations applied on each inserted motif, and the fourth data set represents the inserted motif as a position-specific probability matrix. We use a novel proposed metric in order to evaluate the models performance, $S_{alpha}$, which is based on the Jaccard Index. We use 10-fold cross validation to evaluate out model. Using $S_{alpha}$, we measure the accuracy of the model by identifying the 15 highest-scoring 15-mer indices of the predicted scores that agree with that of the actual scores within a selected $alpha$ region. For the best performing data set, our results indicate on average 99.3% of the top 15 motifs were identified correctly within a one base pair stride ($alpha = 1$) in the out of sample data. To the best of our knowledge, this is a novel approach that illustrates how data formatted in an intelligent way can be extrapolated using machine learning.