No Arabic abstract
Next-generation sequencing technology enables routine detection of bacterial pathogens for clinical diagnostics and genetic research. Whole genome sequencing has been of importance in the epidemiologic analysis of bacterial pathogens. However, few whole genome sequencing-based genotyping pipelines are available for practical applications. Here, we present the whole genome sequencing-based single nucleotide polymorphism (SNP) genotyping method and apply to the evolutionary analysis of methicillin-resistant Staphylococcus aureus. The SNP genotyping method calls genome variants using next-generation sequencing reads of whole genomes and calculates the pair-wise Jaccard distances of the genome variants. The method may reveal the high-resolution whole genome SNP profiles and the structural variants of different isolates of methicillin-resistant S. aureus (MRSA) and methicillin-susceptible S. aureus (MSSA) strains. The phylogenetic analysis of whole genomes and particular regions may monitor and track the evolution and the transmission dynamic of bacterial pathogens. The computer programs of the whole genome sequencing-based SNP genotyping method are available to the public at https://github.com/cyinbox/NGS.
We calculate the mutual information function for each of the 24 chromosomes in the human genome. The same correlation pattern is observed regardless the individual functional features of each chromosome. Moreover, correlations of different scale length are detected depicting a multifractal scenario. This fact suggest a unique mechanism of structural evolution. We propose that such a mechanism could be an expansion-modification dynamical system.
The Dissertation is focused on the studies of associations between functional elements in human genome and their nucleotide structure. The asymmetry in nucleotide content (skew, bias) was chosen as the main feature for nucleotide structure. A significant difference in nucleotide content asymmetry was found for human exons vs. introns. Specifically, exon sequences display bias for purines (i.e., excess of A and G over C and T), while introns exhibit keto-amino skew (i.e. excess of G and T over A and C). The extents of these biases depend upon gene expression patterns. The highest intronic keto-amino skew is found in the introns of housekeeping genes. In the case of introns, whose sequences are under weak repair system, the AT->GC and CG->TA substitutions are preferentially accumulated. A comparative analysis of gene sequences encoding cytochrome P450 2E1 of Homo sapiens and representative mammals was done. The cladistic tree on the basis of coding sequences similarity of the gene Cyp2e1 was constructed. A new programming tools of NCBI database sequence mining and analysis was developed, resulting in construction of a own database.
The increased affordability of whole genome sequencing has motivated its use for phenotypic studies. We address the problem of learning interpretable models for discrete phenotypes from whole genomes. We propose a general approach that relies on the Set Covering Machine and a k-mer representation of the genomes. We show results for the problem of predicting the resistance of Pseudomonas Aeruginosa, an important human pathogen, against 4 antibiotics. Our results demonstrate that extremely sparse models which are biologically relevant can be learnt using this approach.
Anti-staphylococcal penicillins (ASPs) are recommended as first-line agents in methicillin-susceptible Staphylococcus aureus (MSSA) bacteraemia. Concerns about their safety profile have contributed to the increased use of cefazolin. The comparative clinical effectiveness and safety profile of cefazolin versus ASPs for such infections remain unclear. Furthermore, uncertainty persists concerning the use of cefazolin due to controversies over its efficacy in deep MSSA infections and its possible negative ecological impact.
The emerging global infectious COVID-19 coronavirus disease by novel Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) presents critical threats to global public health and the economy since it was identified in late December 2019 in China. The virus has gone through various pathways of evolution. For understanding the evolution and transmission of SARS-CoV-2, genotyping of virus isolates is of great importance. We present an accurate method for effectively genotyping SARS-CoV-2 viruses using complete genomes. The method employs the multiple sequence alignments of the genome isolates with the SARS-CoV-2 reference genome. The SNP genotypes are then measured by Jaccard distances to track the relationship of virus isolates. The genotyping analysis of SARS-CoV-2 isolates from the globe reveals that specific multiple mutations are the predominated mutation type during the current epidemic. Our method serves a promising tool for monitoring and tracking the epidemic of pathogenic viruses in their gradual and local genetic variations. The genotyping analysis shows that the genes encoding the S proteins and RNA polymerase, RNA primase, and nucleoprotein, undergo frequent mutations. These mutations are critical for vaccine development in disease control.